Every year, at the first faculty meeting, representatives of the registrar tell us what percentage of the incoming class is [insert variable in which we are interested, such as American Indian, working class, international, etc].  They compare it to last year’s percentage.  This drives me crazy because they do so as if comparing the last two data points in a sequence is indicative of a trend. But to determine whether or not there is a trend, and therefore whether the increase or decrease in the percentage of [insert variable in which we are interested] significant relative to last year, depends on more than two data points!

xkcd does an excellent job of illustrating just how two data points can be utterly meaningless, even wildly fallacious:

extrapolating

Other great xkcd cartoons: attribution and the in group, on statistical significance, correlation or causation, and the minimal group paradigm.

Originally posted in 2009.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

In statistics, a little star next to a coefficient generally means that the result is statistically significant at the p<.05 level. In English, this means that there is only a 1 in 20 chance that the finding just popped up by pure random chance. In sociology, that’s generally considered good enough to conclude that the finding is “real.”

If one investigates a lot of relationships, however, this way of deciding which ones to claim as real has an obvious pitfall.  If you look at 20 possible but false relationships, chances are that one of them will be statistically significant by chance alone. Do enough fishing in a dead lake, in other words, and you’ll inevitably pull up some garbage.

Thanks xkcd, for making this funny:

1 (3)

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

2At Junk Charts, Kaiser Fung drew my attention to a graph released by Reuters.  It is so deeply misleading that I loathe to expose your eyeballs to it.  So, I offer you this:

1The original figure is on the left.  It counts the number of gun deaths in Florida.  A line rises, bounces a little, reaches a 2nd highest peak labeled “2005, Florida enacted its ‘Stand Your Ground’ law,” and falls precipitously.

What do you see?

Most people see a huge fall-off in the number of gun deaths after Stand Your Ground was passed.  But that’s not what the graph shows.  A quick look at the vertical axis reveals that the gun deaths are counted from top (0) to bottom (800).  The highest peaks are the fewest gun deaths and the lowest ones are the most.  A rise in the line, in other words, reveals a reduction in gun deaths.  The graph on the right — flipped both horizontally and vertically — is more intuitive to most: a rising line reflects a rise in the number of gun deaths and a dropping a drop.

The proper conclusion, then, is that gun deaths skyrocketed after Stand Your Ground was enacted.

This example is a great reminder that we bring our own assumptions to our reading of any illustration of data.  The original graph may have broken convention, making the intuitive read of the image incorrect, but the data is, presumably, sound.  It’s our responsibility, then, to always do our due diligence in absorbing information.  The alternative is to be duped.

Cross-posted at Pacific Standard.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

2

By xkcd.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Flashback Friday.

In a great book, The Averaged American, sociologist Sarah Igo uses case studies to tell the intellectual history of statistics, polling, and sampling. The premise is fascinating:  Today we’re bombarded with statistics about the U.S. population, but this is a new development.  Before the science developed, the concept was elusive and the knowledge was impossible. In other words, before statistics, there was no “average American.”

There are lots of fascinating insights in her book, but a post by Byron York brought one in particular to mind.  Here’s a screenshot of his opening lines (emphasis added by Jay Livingston):

00_actually

The implication here is, of course, that Black Americans aren’t “real” Americans and that including them in opinion poll data is literally skewing the results.

Scientists designed the famous Middletown study with exactly this mentality.  Trying to determine who the average American was, scientists excluded Black Americans out of hand.  Of course, that was in the 1920s and ’30s.  How wild to see the same mentality in the 2000s.

Originally posted in 2009.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

First, there were the accolades. More than 100 instances of street harassment in a two minute video, testifying powerfully to the routine invasion of women’s lives by male strangers.

Then, there was the criticism. How is it, people asked, that the majority of the men are black? They argued: this video isn’t an indictment of men, it’s an indictment of black men.

Now, we’ve reached the third stage: lessons for research methods classes.

Our instructor is sociologist Zeynep Tufekci, writing at The Message. Our competing hypotheses are three:

1. Black men really do catcall more than other kinds of men.

2. The people who made this video are unconsciously or consciously racist, editing out men of other races.

3. The study was badly designed.

As Tufekci points out, any one of these could account for why so many of the catcallers were black. Likewise, all three could be at play at once.

Enter, the data wrangler: Chris Moore at Mass Appeal.

Moore and his colleagues looked for landmarks in the video in order to place every instance of harassment on the map of New York City. According to their analysis, over half of the harassment occurs on just one street — 125th — in Harlem.

2

Did the time the producers spent in Harlem involve denser rates of harassment, supporting hypothesis #1. Did they spend an extra amount of time in Harlem because they have something against black men? That’d be hypothesis #2. Or is it hypothesis #3: they were thoughtless about their decisions as to where they would do their filming.

Honestly, it’s hard to say without more data, such as knowing how much time they spent in each neighborhood and in neighborhoods not represented in the video. But if it’s true that they failed to sample the streets of New York City in any meaningful way — and I suspect it is — then hypothesis #3 explains at least some of why black men are over-represented.

And that fact should motivate us all to do our methods right. If we don’t, we may end up offering accidental and fallacious support to ideas that we loathe.

Cross-posted at Pacific Standard.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

1 (3)

Saturday Morning Breakfast Cereal, by Zach Weiner.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

correlation

By xkcd.

Originally posted in 2009.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.