I’m not saying that the Patriots are out-and-out liars. But they are outliers.
The advantage of an underinflated ball, like the eleven of the twelve footballs the Patriots used last Sunday, is that it’s easier to grip. Ball carriers will be less likely fumble if they’re gripping a ball they can sink their fingers into.
We can’t go back and measure the pressure of balls the Patriots were using before the Colts game, but Warren Sharp (here) went back and dug up the data on fumbles for all NFL games since 2010. Since a team that controls the ball and runs more plays has more chances to fumble, Sharp graphed the ratio of plays to fumbles (values in red squares in the chart below) along with the absolute number of fumbles (values in blue circles). The higher the ratio, the less fumble-prone the team was.
One of these things is not like the others. That’s what an outlier is. It’s off the charts. It’s nowhere near the trend line. Something about it is very different. The variables that might explain the differences among the other data points – better players, better weather or a domed stadium, a pass-centered offense – don’t apply. Something else is going on.
As the graph shows, when the teams are rank ordered on the plays/fumbles ratio, the difference between one team and the next higher is usually 0-2, there are only two gaps of 5 until the 9-point gap between #3 Atlanta and #2 Houston. From the second-best Texans and to the Patriots there’s a 47-point jump.
Sharp also graphed the data as a histogram.
It’s pretty much a bell curve centered around the mean of 105 plays-per-fumble. Except for that outlier. And the chart shows just how far out it lies.
The Patriots play in a cold-weather climate in a stadium exposed to the elements. Yet their plays/fumble ratio is 50% higher than that of the Packers, 80% higher than the Bears. They have good players, but those players fumble less often for the Patriots than they did when they played for other NFL teams.
Usually, the statistical anomaly comes first – someone notices that US healthcare costs are double those of other nations – and then people try to come up with explanations. In this case, it wasn’t until we had a possible explanatory variable that researchers went back and found the outlier. As Peter Sagal of “Wait, Wait, Don’t Tell Me” said, “The League became suspicious when a Patriots player scored a touchdown and instead of spiking the ball he just folded it and put it in his pocket.”
UPDATE, Jan. 28: Since I posted this, there has been some discussion of Sharp’s data (“discussion” is a euphemism – this is sports and the Internet after all). If you’re really interested in pursuing this, try Advanced Football Analytics or this piece at Deadspin “Why Those Statistics About The Patriots’ Fumbles Are Mostly Junk,” (to repeat, “discussion” is a euphemism, and if you more strongly voiced views, read the comments). One of the difficulties I suspect is that a fumble is a rare event. The difference between the teams with the surest grip and the most butterfingered is about one fumble every couple of games.
Cross-posted at Montclair SocioBlog.
Every year, at the first faculty meeting, representatives of the registrar tell us what percentage of the incoming class is [insert variable in which we are interested, such as American Indian, working class, international, etc]. They compare it to last year’s percentage. This drives me crazy because they do so as if comparing the last two data points in a sequence is indicative of a trend. But to determine whether or not there is a trend, and therefore whether the increase or decrease in the percentage of [insert variable in which we are interested] significant relative to last year, depends on more than two data points!
xkcd does an excellent job of illustrating just how two data points can be utterly meaningless, even wildly fallacious:
Originally posted in 2009.
In statistics, a little star next to a coefficient generally means that the result is statistically significant at the p<.05 level. In English, this means that there is only a 1 in 20 chance that the finding just popped up by pure random chance. In sociology, that’s generally considered good enough to conclude that the finding is “real.”
If one investigates a lot of relationships, however, this way of deciding which ones to claim as real has an obvious pitfall. If you look at 20 possible but false relationships, chances are that one of them will be statistically significant by chance alone. Do enough fishing in a dead lake, in other words, and you’ll inevitably pull up some garbage.
Thanks xkcd, for making this funny:
At Junk Charts, Kaiser Fung drew my attention to a graph released by Reuters. It is so deeply misleading that I loathe to expose your eyeballs to it. So, I offer you this:
The original figure is on the left. It counts the number of gun deaths in Florida. A line rises, bounces a little, reaches a 2nd highest peak labeled “2005, Florida enacted its ‘Stand Your Ground’ law,” and falls precipitously.
What do you see?
Most people see a huge fall-off in the number of gun deaths after Stand Your Ground was passed. But that’s not what the graph shows. A quick look at the vertical axis reveals that the gun deaths are counted from top (0) to bottom (800). The highest peaks are the fewest gun deaths and the lowest ones are the most. A rise in the line, in other words, reveals a reduction in gun deaths. The graph on the right — flipped both horizontally and vertically — is more intuitive to most: a rising line reflects a rise in the number of gun deaths and a dropping a drop.
The proper conclusion, then, is that gun deaths skyrocketed after Stand Your Ground was enacted.
This example is a great reminder that we bring our own assumptions to our reading of any illustration of data. The original graph may have broken convention, making the intuitive read of the image incorrect, but the data is, presumably, sound. It’s our responsibility, then, to always do our due diligence in absorbing information. The alternative is to be duped.
Cross-posted at Pacific Standard.
In a great book, The Averaged American, sociologist Sarah Igo uses case studies to tell the intellectual history of statistics, polling, and sampling. The premise is fascinating: Today we’re bombarded with statistics about the U.S. population, but this is a new development. Before the science developed, the concept was elusive and the knowledge was impossible. In other words, before statistics, there was no “average American.”
The implication here is, of course, that Black Americans aren’t “real” Americans and that including them in opinion poll data is literally skewing the results.
Scientists designed the famous Middletown study with exactly this mentality. Trying to determine who the average American was, scientists excluded Black Americans out of hand. Of course, that was in the 1920s and ’30s. How wild to see the same mentality in the 2000s.
Originally posted in 2009.
First, there were the accolades. More than 100 instances of street harassment in a two minute video, testifying powerfully to the routine invasion of women’s lives by male strangers.
Then, there was the criticism. How is it, people asked, that the majority of the men are black? They argued: this video isn’t an indictment of men, it’s an indictment of black men.
Now, we’ve reached the third stage: lessons for research methods classes.
1. Black men really do catcall more than other kinds of men.
2. The people who made this video are unconsciously or consciously racist, editing out men of other races.
3. The study was badly designed.
As Tufekci points out, any one of these could account for why so many of the catcallers were black. Likewise, all three could be at play at once.
Enter, the data wrangler: Chris Moore at Mass Appeal.
Moore and his colleagues looked for landmarks in the video in order to place every instance of harassment on the map of New York City. According to their analysis, over half of the harassment occurs on just one street — 125th — in Harlem.
Did the time the producers spent in Harlem involve denser rates of harassment, supporting hypothesis #1. Did they spend an extra amount of time in Harlem because they have something against black men? That’d be hypothesis #2. Or is it hypothesis #3: they were thoughtless about their decisions as to where they would do their filming.
Honestly, it’s hard to say without more data, such as knowing how much time they spent in each neighborhood and in neighborhoods not represented in the video. But if it’s true that they failed to sample the streets of New York City in any meaningful way — and I suspect it is — then hypothesis #3 explains at least some of why black men are over-represented.
And that fact should motivate us all to do our methods right. If we don’t, we may end up offering accidental and fallacious support to ideas that we loathe.
Cross-posted at Pacific Standard.