Tag Archives: methods/use of data

Just for Fun: Is Truncating the Y-Axis Dishonest?

What do you think?

B3ZPiyZCMAAjU0s

Thanks to @WyoWeeds!

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Just for Fun: But… Can One Be Too Meta?

3Thanks xkcd.

The Patriots: Out-and-Out Liars? Or Just Outliers?

I’m not saying that the Patriots are out-and-out liars. But they are outliers.

The advantage of an underinflated ball, like the eleven of the twelve footballs the Patriots used last Sunday, is that it’s easier to grip. Ball carriers will be less likely fumble if they’re gripping a ball they can sink their fingers into.

We can’t go back and measure the pressure of balls the Patriots were using before the Colts game, but Warren Sharp (here) went back and dug up the data on fumbles for all NFL games since 2010.  Since a team that controls the ball and runs more plays has more chances to fumble, Sharp graphed the ratio of plays to fumbles (values in red squares in the chart below) along with the absolute number of fumbles (values in blue circles). The higher the ratio, the less fumble-prone the team was.

1

One of these things is not like the others.  That’s what an outlier is. It’s off the charts. It’s nowhere near the trend line. Something about it is very different. The variables that might explain the differences among the other data points – better players, better weather or a domed stadium, a pass-centered offense – don’t apply. Something else is going on.

As the graph shows, when the teams are rank ordered on the plays/fumbles ratio, the difference between one team and the next higher is usually 0-2, there are only two gaps of 5 until the 9-point gap between #3 Atlanta and #2 Houston. From the second-best Texans and to the Patriots there’s a 47-point jump.

Sharp also graphed the data as a histogram.

1 (4)

It’s pretty much a bell curve centered around the mean of 105 plays-per-fumble. Except for that outlier. And the chart shows just how far out it lies.

The Patriots play in a cold-weather climate in a stadium exposed to the elements.  Yet their plays/fumble ratio is 50% higher than that of the Packers, 80% higher than the Bears. They have good players, but those players fumble less often for the Patriots than they did when they played for other NFL teams.

Usually, the statistical anomaly comes first – someone notices that US healthcare costs are double those of other nations – and then people try to come up with explanations.  In this case, it wasn’t until we had a possible explanatory variable that researchers went back and found the outlier. As Peter Sagal of “Wait, Wait, Don’t Tell Me” said, “The League became suspicious when a Patriots player scored a touchdown and instead of spiking the ball he just folded it and put it in his pocket.”

UPDATE, Jan. 28: Since I posted this, there has been some discussion of Sharp’s data (“discussion” is a euphemism – this is sports and the Internet after all). If you’re really interested in pursuing this, try Advanced Football Analytics  or this piece  at Deadspin “Why Those Statistics About The Patriots’ Fumbles Are Mostly Junk,” (to repeat, “discussion” is a euphemism, and if you more strongly voiced views, read the comments). One of the difficulties I suspect is that a fumble is a rare event. The difference between the teams with the surest grip and the most butterfingered is about one fumble every couple of games.

Cross-posted at Montclair SocioBlog.

Jay Livingston is the chair of the Sociology Department at Montclair State University. You can follow him at Montclair SocioBlog or on Twitter.

Just for Fun: The Folly of Two Data Points

Every year, at the first faculty meeting, representatives of the registrar tell us what percentage of the incoming class is [insert variable in which we are interested, such as American Indian, working class, international, etc].  They compare it to last year’s percentage.  This drives me crazy because they do so as if comparing the last two data points in a sequence is indicative of a trend. But to determine whether or not there is a trend, and therefore whether the increase or decrease in the percentage of [insert variable in which we are interested] significant relative to last year, depends on more than two data points!

xkcd does an excellent job of illustrating just how two data points can be utterly meaningless, even wildly fallacious:

extrapolating

Other great xkcd cartoons: attribution and the in group, on statistical significance, correlation or causation, and the minimal group paradigm.

Originally posted in 2009.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Just For Fun: The Trouble with p<.05

In statistics, a little star next to a coefficient generally means that the result is statistically significant at the p<.05 level. In English, this means that there is only a 1 in 20 chance that the finding just popped up by pure random chance. In sociology, that’s generally considered good enough to conclude that the finding is “real.”

If one investigates a lot of relationships, however, this way of deciding which ones to claim as real has an obvious pitfall.  If you look at 20 possible but false relationships, chances are that one of them will be statistically significant by chance alone. Do enough fishing in a dead lake, in other words, and you’ll inevitably pull up some garbage.

Thanks xkcd, for making this funny:

1 (3)

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

How to Lie with Statistics: Stand Your Ground and Gun Deaths

2At Junk Charts, Kaiser Fung drew my attention to a graph released by Reuters.  It is so deeply misleading that I loathe to expose your eyeballs to it.  So, I offer you this:

1The original figure is on the left.  It counts the number of gun deaths in Florida.  A line rises, bounces a little, reaches a 2nd highest peak labeled “2005, Florida enacted its ‘Stand Your Ground’ law,” and falls precipitously.

What do you see?

Most people see a huge fall-off in the number of gun deaths after Stand Your Ground was passed.  But that’s not what the graph shows.  A quick look at the vertical axis reveals that the gun deaths are counted from top (0) to bottom (800).  The highest peaks are the fewest gun deaths and the lowest ones are the most.  A rise in the line, in other words, reveals a reduction in gun deaths.  The graph on the right — flipped both horizontally and vertically — is more intuitive to most: a rising line reflects a rise in the number of gun deaths and a dropping a drop.

The proper conclusion, then, is that gun deaths skyrocketed after Stand Your Ground was enacted.

This example is a great reminder that we bring our own assumptions to our reading of any illustration of data.  The original graph may have broken convention, making the intuitive read of the image incorrect, but the data is, presumably, sound.  It’s our responsibility, then, to always do our due diligence in absorbing information.  The alternative is to be duped.

Cross-posted at Pacific Standard.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Just for Fun: Data is Plural, Except When It Isn’t

2

By xkcd.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Excluding Blacks From The National Collective

Flashback Friday.

In a great book, The Averaged American, sociologist Sarah Igo uses case studies to tell the intellectual history of statistics, polling, and sampling. The premise is fascinating:  Today we’re bombarded with statistics about the U.S. population, but this is a new development.  Before the science developed, the concept was elusive and the knowledge was impossible. In other words, before statistics, there was no “average American.”

There are lots of fascinating insights in her book, but a post by Byron York brought one in particular to mind.  Here’s a screenshot of his opening lines (emphasis added by Jay Livingston):

00_actually

The implication here is, of course, that Black Americans aren’t “real” Americans and that including them in opinion poll data is literally skewing the results.

Scientists designed the famous Middletown study with exactly this mentality.  Trying to determine who the average American was, scientists excluded Black Americans out of hand.  Of course, that was in the 1920s and ’30s.  How wild to see the same mentality in the 2000s.

Originally posted in 2009.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.