Category Archives: graphic comparison

Visualizing email traffic

Editing process in graphic design

The editing process in graphic design is somewhat different than the editing process in writing. Writers tend to start with a skeleton, make sure the bones are all in the right places, and then slowly add and sculpt musculature and skin through iterative processes. Graphic designers start with a whole bunch of skeletons, subtract a few, add musculature to the rest, subtract a few of those, add skin to the remaining ones, and then only late in the process will a single design go through a final polishing process.

One of the ways social scientists teach students to become skeptical about the things they read is by teaching them how to edit their own work and the work of others. Students start to see how pieces of written work represent a series of choices. They see that what they’ve read could have gone in other conceptual directions, used different evidence, been shortened, lengthened, stripped of jargon, or otherwise constructed and styled in new ways that could have changed the meanings taken away by the readers. Learning to construct, critique, and polish writing is a major part of how readers develop the tools they need to understand and analyze the works they read.

There is far less educational time spent teaching students how to create visual work, especially visual work outside of the realm of personal expression (I feel like most arts programs emphasize personal expression which is different than creating visual work with the intent of displaying data or even political messaging). It is not surprising that we end up with a bunch of people who struggle to apply an analytic lens to information graphics. This leads to a communications power imbalance that privileges certain kinds of visual devices, including information graphics, over writing inasmuch as information graphics are more likely to be accepted without too much scrutiny since most folks do not have a good idea where to begin to scrutinize them. Information graphics combine the moral authority of numbers with the cognitive inertia of sight that lies behind the cliche that ‘seeing is believing’.

In the service of pulling back the curtain on graphic design, I thought it might be useful to save an entire series of drafts in the development process of a graphic that describes the email traffic in a small design work group. The purpose is to break the seal around the image and reveal it is a series of decisions that might easily have been otherwise.

First Draft

First, I thought a stem and leaf diagram might work.

Stem and Leaf diagrams of office email traffic

Stem and Leaf diagrams of office email traffic

But these graphics failed because there was no way to keep strings of receiving or sending visually united. If the people in the office happened to be sending (or receiving) a series of email that spanned between one ten-minute period and the next ten-minute period, that run would be visually broken. I also wasn’t thrilled with the way the sent email matched up with the received email. It was hard to see that when one person in the office sent an email, it would often land in the inbox of someone else in the office.

Still, I liked the version where I turned the numbers into balls and that idea came back in a different form later in the development process.

Second Draft

I decided to abandon the stem and leaf for a timeline. I initially imagined triangles as markers for the email because I thought the shape would indicate the directionality of an email going out into the internet.

email traffic timeline, version 1

This version has an entire day on one page, morning sits above afternoon.

And I tried some different color schemes.

Email traffic timeline, version 1.1

Email traffic timeline, version 1.1 stretching the day across two pages.

Email traffic timeline, version 1.2

Email traffic timeline, version 1.2

The triangles did not work and some of the color schemes created a sense of vibration. A trained graphic designer might have tried the triangles (and rejected them, of course), but they would not have made the mistakes with color that I did.

Third draft

I replotted the graphic with circles, not triangles, and added up all the emails that were received in 5-minute periods instead of plotting each individually. This lost a bit of granularity, but it made it easier to see where traffic was greatest because it allowed the height of the circles start to draw the eye.

Email timeline, version 1.3

Email timeline, version 1.3
There is another page to the right of this one but viewing the image at this scale displays more detail.

This version is much closer to the final but something was missing.

Fourth draft

I started to realize that the timelines were difficult to analyze so I went back to the data and pulled out some summary statistics about the average number of emails each person sent and received. I also thought it would be interesting to see how much of the officewide traffic each person generated. While I was looking for new ways to help people understand what they were looking at, I also showed them the range of reality in the same timeline format by pulling out the lines for the highest traffic person-day and the lowest traffic person-day. I also remembered one of the lessons I learned from reading Nathan Yau’s Visualize This and added some descriptive text. [A full review of that book is here.]

Office email traffic

Office email traffic

This is as far as I have gotten. But if I get good suggestions in the comments, I’ll keep improving.

What can writers learn from graphic designers

Getting through this many drafts alone was hard. It is very hard to see the same thing with new eyes. I got some help from two different people and even though neither of them said much, their opinions made a huge difference in the process. I encourage writers to find a way to share their work with others earlier in the process. It is humbling. If the comparison to graphic design is apt, earlier sharing either of the whole draft or of smaller sections will also likely lead to a stronger piece that gets written faster.

Stem and leaf diagrams

Stem and leaf diagram becomes a histogram

What works

The stem and leaf diagram is an old stand-by that has largely been abandoned in social science as it morphed into the histogram. It is a rather ingenious graphical device that could be created even with a typewriter, which is how people used to prepare documents not that long ago. And when I say ‘people’ used to prepare documents, I am actually imagining wives and girlfriends of the husbands and boyfriends who were preparing final drafts of their dissertations and later the (mostly female) secretaries, administrators, and lab assistants typing up articles and figures for (mostly male) professors. [Refer to this graphic on the gendered nature of degrees at the doctoral level for supporting evidence that it was mostly men writing dissertations and then getting the jobs available to people who had written dissertations.]

How to make a stem and leaf diagram

1. Start with numerical data. Organize it from least to greatest.

2. Think of each number as having a stem and a leaf. The stem is the more durable part of the number and the leaf is the more sensitive part of the number. For a number like 57, the more durable part of the number is the ’5′ because even if there was some variation in the measure, the number in the 10′s spot might not change but the ’7′ in the singles spot is more sensitive and thus more likely to flutter like a leaf. If we were measuring temperature, for instance, it would be a lot more likely that the day would have temperatures like 56 and 58 than 60-something and 40-something. Thus, the tens spot is the stem and the singles spot is the leaf in this case. It would be possible to use measurements in the hundreds or even thousands.

3. Once you have identified your stems and leaves, type the lowest stem value. Then type a bar or some other vertical device to separate your stem from your leaves. Then look at all the observations you have for that stem value. Type in every single observed leaf value for that stem, starting with the lowest one. So if you are creating a diagram of all the temperatures registered at noon for the month of November, you will have 30 values to stick in your chart. You will probably have something like three values in the 30s – say, 35, 37, and 38. This would mean you would type a 3, then a vertical bar, then 5, 7, and 8. If there were also nine values in the 40s – say 40, 41, 42, 42, 43, 45, 45, 46, and 48 you would hit carriage return. Then you’d type a 4, a vertical bar, and 0 1 2 2 3 5 5 6 8. You see how people (mostly women) could use typewriters to make graphics.

The strength of this technique is that it forces the actual dataset into a visually organized diagram. All of the values can be read right out of the graph but the device as a whole gives an impression of the overall pattern.

4. At some point after typewriters, the stem and leaf diagram morphed into a histogram. I think Excel had something to do with this, but I am still researching just how it was that the stem and leaf diagram was relegated to the dustbin while the histogram rose to take its place.

Worth thinking about

Stem and leaf diagrams are close cousins of bar charts and histograms. While bar charts and histograms might be more attractive in some ways, they are, in fact, less data-rich. It is not possible to read the actual values out of a colored bar. Despite the fact that the histogram chart form *could* be more visually pleasing than the stem and leaf diagram the fact that histograms allow more space for aesthetics means that they can just as easily be uglier, not more appealing, than stem and leaf diagrams. Dumb and ugly is no good at all. Still, bar charts gave rise to things like stacked bar charts that allow us to visualize observations for multiple investigations that share the same variables so I do not consider them a step backwards.

What about global body mass index?

The information in the graphs above comes from the World Health Organization’s database of global body mass index. The numbers represent the percentage of people in the overweight or obese range of the body mass index in individual countries, NOT the average body mass index of individual countries. Notice that one country [American Samoa] has over 90% of its adult population in the overweight or obese range. If you’re curious, the US has 66.9% of our adults in the overweight+obese range. Vietnam is on the low end with only 5% of its adults overweight or obese.

References

World Health Organization Global Database on Body-Mass Index. [Last accessed 17 November 2012]

Global smoking rates by gender

What works

The Economist put together an infographic using data from a study published last week in The Lancet collected by an impressively large team of researchers from three different institutions in three different countries (The World Health Organisation, America’s Centres for Disease Control and the Canadian Public Health Association). The article in the Lancet has much more detailed data about all sorts of smoking traits that did not make it into this chart, but the chart succeeds in portraying two gendered vectors of smoking behavior: the different rates of smoking between men and women and the difference in the number of cigarettes smoked between the two genders.

Globally speaking, it is safe to say that smoking is a masculine activity. There is no country in which more women than men are smokers. That particular take-away is made extremely clear in the chart. Just a glance is enough exposure to the data to absorb the idea that smoking is somehow masculine.

What needs work

The graphic designers at the Economist try to expand on the notion that smoking is “somehow masculine” by layering another set of findings onto the basic rates of smoking by men and women. Way off to the right they have what is essentially two columns of a table that report the average number of cigarettes smoked by men and women. My fuzzy and addled brain wants this little table to be more like a bar chart in which the length of the bars corresponds to the number of smokes. Countries where smoking rates are highest would have longer bars. Countries where smoking rates are low would have shorter bars. Visually, the impact would increase dramatically if the size of the bar corresponded to the amount of cigarettes smoked.

Importantly for the point about the gendered nature of smoking, we could see another way in which smoking is gendered by looking at how many cigarettes are smoked by each gender. Some countries have dramatic differences: in Russia and Turkey men smoke about 1.5 times as many cigarettes as women. This is a marked contrast to the other end of the spectrum where in India, women who smoke (and there are very few women who smoke in India), smoke 7 cigarettes per day while the smoking men only smoke 6.1 cigarettes per day. If that part of the graphic had been given more space, it would have been easier to quickly absorb that pattern. As it is, only a careful reading of that table yields insight; we might as well just look at the data in Excel.

The other change I would order up for this graphic is to make the blue horizontal bars that run the full length of the graphic a different color than the male icon. My best option would have been to make the horizontal bars grey and truncate them after the male icon. There’s no need for them to go all the way across and it makes the table slightly harder to read. I realize that changing the horizontal bars to grey would then give the whole table a gridlike look due to the presence of the vertical bars. I would just shorten the vertical bars to tick marks at the top and tick marks at the bottom (it is a tall chart so tick marks only at the top or only at the bottom would be invisible to people who have to scroll to see the whole graphic).

I like the coral color used for the female icons. I would have turned the men navy because coral and navy are complimentary colors and look especially good together.

I wasn’t able to add the bar graphs out to the side or to fully eliminate the baby blue, but I did make some of the changes I suggested on the jpg below for your viewing ease.

Remix of The Economist Daily Chart from 20 August 2012 - Puffed Out: Daily cigarette smoking by men and women

References

The Economist. (20 August 2012) Puffed Out: Daily cigarette smoking by men and women The Economist: Daily Charts. [graphic design]

Giovino, Gary, et al. (18 August 2012) Tobacco use in 3 billion individuals from 16 countries: an analysis of nationally representative cross-sectional household surveys. The Lancet, Volume 380, Issue 9842, Pages 668 – 679, doi:10.1016/S0140-6736(12)61085-X

London Underground – Historical Ads

What works

The London Underground has a lengthy history of using infographic thinking in their advertisements (see these ads and more on retronaut.co). What works here is that some of these ads, especially the first one, could still be used with positive impact today if the silhouettes were updated to include the transit types actually on the street out there. If I saw an infographic that compared the speed of walking (with and without a stroller), taking the subway, taking a cab, and biking to incite me to take the subway or bike, I would find that compelling. I’d imagine many New Yorkers would agree with me. Probably so would Londoners. It is remarkable how long lasting this ad is.

What needs work

The ad needs to have a better implementation of the scale associations in the miles per hour that would help communicate the idea that the underground is faster than all the other modes of mobility. If someone were to make this infographic today, they would probably make the slower forms of mobility look shorter (almost like applying a bar graph where the slower mobility forms haven’t made it as far across the page). They probably would also scale the size of the number representing kilometers per hour. Maybe they would become more and more italicized, leaning farther and farther to the right to indicate speed. Maybe they just would have gotten bigger as they approached the fastest speed.

Moving on in time, I think the next ads for the London Underground are actually not as strong as this first one, at least until we get to 1969. We see below a graphic that is supposed to help Londoners understand what their Underground fares are actually funding, but there is no scale comparison available from one ‘bar’ in the bar graph to the next. What’s more, the numbers associated with the bars are represented by the coinage. The viewer has to do the math by himself or herself. Personally, I find that to be a kind of naive approach to representing the fare distribution, one that has the viewer doing mental work to add up coinage, which is kind of incidental to the question, rather than comparing one category of expenditures to the others, which is the heart of the question that was posed.

The Individual Group, Pop Art, and London Underground ad improvements

This ad is much better, more compelling, it still carries the idea of infographic representation from the fare split into coinage by representing people not as dots but by keeping them as actual people (or passenger cars). The photo of a street full of cars that stretch so far we can’t see the end of it steps to a photo of just the human bodies carried by those cars and finally all those humans on a single bus. This particular instantiation of that idea is much stronger. In my opinion, I imagine the advertisers here having been influenced by the artistic work of the UK’s Individual Group who were the British version of American Pop Artists.

London Underground Ad 1969

London Underground Ad 1969

Artistic comparison

Just for fun, compare the ad above with some work by American Feminist Artist Barbara Kruger (for you non-art history people, feminist art followed pop art and used a lot of performance work but also maintained some of the pop art movement’s interest in the tropes of advertising, collage techniques, and the use of text in art. See also later conceptual artist Jenny Holzer.)

Barbara Kruger "Your gaze hits the side of my face" 1982

Barbara Kruger "Your gaze hits the side of my face" 1982

Barbara Kruger "Your Manias Become Science" 1981

Barbara Kruger "Your Manias Become Science" 1981

Barbara Kruger "Untitled" 1981

Barbara Kruger "Untitled" 1981

References

Retronaut.co (4 January 2012) London Transport Infographics, 1912-1969 [blog post].

Kruger, Barbara. (1981) Untitled. [collage] Accessed online at http://www.eng.fju.edu.tw/Literary_Criticism/feminism/kruger/kruger.htm

Kruger, Barbara. (1981) Untitled. [collage] Accessed online at http://www.eng.fju.edu.tw/Literary_Criticism/feminism/kruger/kruger.htm

Kruger, Barbara. (1982) “Your gaze hits the side of my face” [collage] Accessed online at New York University’s Fales Collection at Bobst Library http://www.nyu.edu/library/bobst/research/fales/exhibits/downtown/soho/sohoart/documents/kruger.html.

Map smoothing technique from David Sparks

What works

I like these maps because they use a smoothing technique currently being developed by David Sparks, a doctoral candidate in political science at Duke University. He uses data with the same kind of granularity – county or census-tract – but then smooths over the harsh (and probably unrealistic) edges that can occur where one county or census block abuts another with a different value for the variable of interest.

Here’s an example of a typical, non-smoothed map visualization using a map made by sociology students at Queens College that I posted about last week:

Percent change in hispanic population in the Dakotas, Minnesota, Iowa, Nebraska, Kansas, Oklahoma

Percent change in hispanic population in the Dakotas, Minnesota, Iowa, Nebraska, Kansas, Oklahoma

As you can see in this map, each county boundary is stark and it appears that there are cases in which counties with no growth in the Hispanic population are right next to counties with sizable increases in Hispanic people. While this is technically true, there are many cases in which it is more useful to give viewers a clearer impressionistic image that depicts where population concentrations are the highest overall backed up by the granular data without displaying all of the granularity itself.

When it is important to portray an impressionistic point – there are more Democrats on the coasts than in the middle of the country – a smoothed map is a much more effective tool.

Sparks was able not only to achieve a better impressionistic glance by smoothing, he also varied the transparency based on the population density. For instance, because the population density in Montana is much lower than the population density in New York, he made Montana a much more ‘transparent’ state so that it would be easy to get an impressionistic sense of the cumulative spread of the variable. When looking at the purple map of Hispanic population increase in the middle states, no consideration was made for the population densities of cities versus rural areas. This visualization style tips the impressionistic balance away from the more densely populated areas.

What needs work

Since I am generally a fan of the smoothed maps for a clear visual depiction of a data story that is meant to be digested from the 30,000-foot view rather than the microscopic examination of differences between counties or even residential blocks, there is not much to dislike in Sparks’ new smoothed maps. However, I would not recommend the use of this kind of smoothed data for looking at micro-level trends. What Sparks offers is a great way to see patterns from 30,000 feet, one that improves on existing common practices in visualizing map data.

My one issue with the distribution of people’s political persuasion in 2008 is that the colors on the ends of the spectrum – blue and red – blend to form the color in the middle of the spectrum – purple. Therefore, places in which there are lots of independents look purplish. So do places where people living close together are evenly split between Republicans and Democrats. Color choice is essential. The color mix made by the colors at the ends of the spectrum should not mix to produce the color chosen to represent a third position. Small quibble and one that Sparks would have had a hard time satisfying. The colors associated with Republicans and Democrats have already been established.

References

Sparks, David B. (2011) Isarithmic maps of public opinion data [blog post and map graphics] dsparks.wordpress.com

Rural midwest population bolstered by Hispanic Americans

What works

This is a quiet story, the kind of thing that may or may not be picked up by a major national newspaper like the New York Times. Rural America is often used as a political flag to wave by politicians, but there is not often too much coverage of day-to-day life. The 2010 Census clearly shows,

The Hispanic population in the seven Great Plains states shown below has increased 75 percent, while the overall population has increased just 7 percent.

What is equally odd is that this story is running two graphics – the set of maps above and the one below – that more or less depict the same thing. I salivate over things like this because it gives me a chance to compare two different graphical interpretations of the same dataset.

The two maps above includes a depiction of the change in the white population as a piece of contextual information to help explain where populations are growing or shrinking overall. These two maps show that 1) in many cases, cities/towns that have experienced a growth in their hispanic populations also received increases in their white populations (hence, there was overall population growth) but that 2) there are some smaller areas that are experiencing growth in the Hispanic populations and declines in the white populations.

The second map shows only the growth in the Hispanic population without providing context about which cities are also experiencing growth in the white population. Looking at the purple map below, it’s hard to tell where cities are growing overall and where they are only seeing increases in the Hispanic population which is a fairly important piece of information.

Percent change in hispanic population in the Dakotas, Minnesota, Iowa, Nebraska, Kansas, Oklahoma

Percent change in hispanic population in the Dakotas, Minnesota, Iowa, Nebraska, Kansas, Oklahoma

What needs work

For the side-by-side maps, the empty and colored circles work well in the rural areas but get confusing in the metropolitan areas. For instance, look at Minneapolis/St. Paul. Are the two central city counties – Hennepin and Ramsey – losing white populations to the suburbs? That is kind of what it looks like but the graphic is not clear enough to show that level of detail. But at least the two orange maps allow me to ask this question. The purple map is too general to even open up that line of critical analysis.

This next point is not a critique of the graphics, but a direction for new research. The graphics suggest, and the accompanying article affirms, that Hispanic newcomers are more likely to move into rural areas than are white people. Why is that? Is it easier to create a sense of community in a smaller area, something that newcomers to the area appreciate? If that is part of the reason new people might choose smaller communities over larger ones, for how many years can we expect the newcomers to stay in rural America? Will they start to move into metro areas over time for the same reason that their white colleagues do?

Are there any other minority groups moving into (or staying in) rural America? Here I am thinking about American black populations in southern states like Alabama, Mississippi, and Arkansas. Are those groups more likely to stay in rural places than their white neighbors? For that matter, what about white populations living in rural Appalachia. Are they staying put or are they moving into cities like Memphis, Nashville, and Lexington?

How do things like educational attainment and income levels work their way into the geographies of urban migration?

References

Queens College Department of Sociology. (13 November 2011) Changing Face of the Rural Plains [information map graphics] Queens College: New York.

Sulzberger, A.G. (13 November 2011) Hispanics Reviving Faded Towns on the Plains New York Times: New York.

Prison vs. Princeton: Annual cost comparison

Education vs. Prison in California | Public Administration

Education vs. Prison in California | Public Administration

What works

The only part of this graphic I kind of liked was the part about California. Here, we are able to compare the average cost of education for a year with the average cost of prison for a year. This is better than comparing the cost of a single school to the average cost of prison, especially when that school is as expensive as Princeton. I still have a problem with this comparison because the cost of school is running over about 8 months whereas the cost of prison is running the full 12 months, or at least that seems to be true from what I can gather. My back-of-the-envelope math suggests prison would be about $32,143 for 8 months. This is still much higher than the average of $7,463 per student spending for 8 months of school. Parent and student contributions to schooling are not factored in, though the point of the graphic is to compare what the state spends on students to what it spends on prisoners, ignoring the total amount spent on students.

What needs work

The information included in this graphic could have been presented in about one fifth of the space. I support the addition of graphical elements to information presentation only when they increase the clarity of the information provided or make the information delivery inarguably more elegant.

What I vastly dislike are the long columns of graphics stacked on top of each other, meant to be viewed as some kind of visual essay. That was where I drew the California graphic from. I pasted it below.

I’m curious. Do other people like these long, internet-only graphic essays? I find them extremely hard to digest. They seem to be plagued by apples-to-oranges faux comparisons, and unbashedly so. A year’s tuition at Princeton doesn’t include room and board. Prison does. Even if that were taken into account, the time frame is off.

One more item to highlight

Note that in the last panel they clue us into an uncomfortable reality: recent college graduates have a higher unemployment rate (12%) than the general population (9%). Ouch.

References

Public Administration. (October 2011) “Prison vs. Princeton” [information graphic]

Resnick, Brian. (1 November 2011) Chart: One year of prison costs more than Princeton The Atlantic online.

Comparing skyscraper height and weight-bearing paths

What works

Following my post about Kate Ascher’s new book, “The Heights: Anatomy of a Skyscraper” I realized that one of the things I liked best about her take on skyscrapers was that she found a way to compare skyscrapers to their alternatives. Usually skyscrapers are just compared to one another, usually stripped of their urban contexts. So there will be a graph like the one below with skyscrapers from all over the world – different cities and climates and purposes – all lined up in height order.

What I like about the graphic at the top (that originally appeared in Scientific American) is that it goes beyond the all-too-common height comparison and describes how weight and other architectural engineering concerns are handled.

What needs work

Given that the graphic by Beau and Allen Daniels was commissioned to appear alongside an article in a magazine that I have not read, I am qualified to discuss what is NOT working. I retrieved the image from their digital portfolio which did not mention the date or title (or author) of the Scientific American piece with which it ran.

References

Ascher, Kate. (2011) The Heights: Anatomy of a Skyscraper. New York: Penguin Press. [see my blog post about Ascher's new book here]

Daniels, Beau. Skyscraper Construction and web-based portfolio.

Time and Newsweek Circulation Figures

Time and Newsweek Circulation Figures | Graphic by Laura Norén

Time and Newsweek Circulation Figures | Graphic by Laura Norén

Newsweek and Time Circulation Figures | Graphic by Yolanda Cuomo

Newsweek and Time Circulation Figures | Graphic by Yolanda Cuomo

Which one works?

These two graphics portray some of the same information – household income, median age, audience and circulation – though the first one does not break down information between genders. Though it probably goes without saying, I like the one I designed best. The second one has some tantalizing shapes – I applaud the visual appeal – but it does nothing to aid people’s eyes as they try to compare relative sizes between the salient categories. I also happen to think it is easier to understand the complexity of the difference between audience and circulation with the textual explanation provided in the first one. I find the white-font-on-dark-background of the Time and Newsweek labels hard to read (it’s also a known graphic design no-no, especially with a small font size like this. It is easier for the human eye to grok the contrast with dark text on a light background than with light text on a dark background).

From a sociological perspective, comparing the readership of Time and Newsweek not only to each other but also to national averages provides a much deeper sense of context. The second graphic was built from the first though I never had a chance to meet with any of the writing or design team to understand why the national averages were removed.

There are other elements I dislike in the second one. I dislike, for instance, the need to repeat certain elements of text over and over again: “readers per copy” and “Total adult population” and even the “Time” and “Newsweek” headings. One of my closest friends and colleagues spends a lot of his time writing code. The best lesson I have learned from him is that where elements or actions have to be repeated over and over, there is inefficiency in the system. A better design is possible.

I would love to hear from my readers on this comparison. Am I suffering from too much ego investment in the graphic I made? Is the second graphic an improvement on the first? If so, how?

References

Norén, Laura. (2010) “Appendix: Data and Methods” in first draft of Dill, Nandi and Telesca, Jen Imagining Emergencies. [Information graphic].

Cuomo, Yolanda. (2011) “Readership Data Time and Newsweek 2008″ in final draft of Dill, Nandi and Telesca, Jen Imagining Emergencies. [Information graphic].

US agricultural commodity subsidies by state, 2010 | New graphic

Beans

Beans

Overview

On Tuesday I read “When One Farm Subsidy Ends, Another May Rise to Replace it” OR “Farmers Facing Loss of Subsidy May Get New One” by William Neuman [aside: why does the NY Times frequently have two titles for the same article? One appears in the title tags in the html and in the URL, the other appears at the top of the article as it is read]. The upshot of the article is that the subsidies appear to be curtailed as cost-saving measures but come right back under new names:

It seems a rare act of civic sacrifice: in the name of deficit reduction, lawmakers from both parties are calling for the end of a longstanding agricultural subsidy that puts about $5 billion a year in the pockets of their farmer constituents. Even major farm groups are accepting the move, saying that with farmers poised to reap bumper profits, they must do their part.

But in the same breath, the lawmakers and their farm lobby allies are seeking to send most of that money — under a new name — straight back to the same farmers, with most of the benefits going to large farms that grow commodity crops like corn, soybeans, wheat and cotton. In essence, lawmakers would replace one subsidy with a new one.

Neuman also interviewed Vincent H. Smith, a professor of farm economics at Montana State University who, “called the maneuver a bait and switch” saying,

“There’s a persistent story that farming is on the edge of catastrophe in America and that’s why they need safety nets that other people don’t get. And the reality is that it’s really a very healthy industry.”

My curiousity was piqued, to say the least. Farm subsidies have long been an emotionally charged issue – Professor Smith is right to point out that the family farmer is an icon in the American zeitgeist whose ideal type gets trotted out as a narrative to support subsidies that often go to large-scale corporate agriculture. Before mounting my own angry response to what appears to be both hypocritical and a well-orchestrated marketing schmooze (ie the public proclamation by various farm lobbies that they are willing to take fewer subsidies as they band with the rest of the beleaguered American public in a collective belt-tightening process while simultaneously opening up other routes to receive the same amount of funding through different mechanisms), I decided to go in search of some hard data to see what is going on with agricultural subsidies.

Agricultural data

I found two great sources of data. First, the USDA runs the National Agricultural Statistics Service which publishes copious amounts of tables full of information about how much farmland there is in the US, what is grown on it, what the yields are, what commodity prices are, what farm expenditures are doing, and all sorts of rich information. Linked from the article was another source of data – the Environmental Working Group – which has been tracking farm subsidies for years. The Environmental Working Group also relies on the National Agricultural Statistics Service, especially for farm subsidy information. Between those two sources, the US Census, and the 2012 US Statistical Abstracts (Table 825 especially), I had more than enough information to start putting together a graphic that could describe at least part of what is going on with agricultural subsidies.

Selecting the right data

Because farming is distributed unevenly around the country, I knew I needed to come up with a set of numbers that went beyond absolute dollar amounts per state. Probably it would have been nice to see where subsidies go per crop, but other people have already done that.

To look at agricultural subsidies overall, and to work with the state-by-state data that I had, I ended up considering three approaches.

  • 1. Absolute commodity subsidy amounts per state.
  • 2. Commodity subsidy amounts per capita.
  • 3. Commodity subsidy amounts per farmland acre.

It is obvious that the third option, looking at the amount of spending per acre within each state, is the best.

Hypothesis

I expected to find that states with small amounts of farmland would be relatively more expensive per acre than states with large amounts of farmland. I assumed there would be economies of scale and that states with very large amounts of farmland probably had a lot of that land dedicated to pasture, which is pretty cheap to maintain compared to something like an orchard.

Attempt Number 1

I decided that simply showing the costs per acre might not be as interesting as keeping the absolute amount of farmland in play and doing some kind of comparison.

Rank comparisons are extremely popular and I admit I was sucked into them, though now that I’ve tried to make them, I kind of hate them. These are the kinds of comparisons that you’ll hear on the news – Ohio ranks Yth in per capita income but Zth in educational spending per pupil – and see in graphics that often look like this:

My first attempt to do something similar looked like this.

US Agricultural Commodity Subsidies | Process Graphic 01

US Agricultural Commodity Subsidies | Process Graphic 01

Here are my problems with it:

  • There is no obvious pattern – it looks like a rat’s nest.
  • The states with bad ratios – the ones where we are paying more than $10/acre – have upward sloping lines connecting them from the left column to the right column. Psychologically, the ‘bad’ deals should have downward sloping lines. It just makes better visual sense.
  • Pink was supposed to be along the lines of red on accounting sheets but it looked too cheery to indicate being ‘in the red’.

Attempt Number 2

US Agricultural Commodity Subsidies - Process Graphic 2

US Agricultural Commodity Subsidies - Process Graphic 2

I got rid of the pink altogether and flipped the scale on the left so that the best deals – the lowest per acre subsidy costs – are at the top. This means that states that are taking less per acre end up having upward sloping lines more often than downward sloping lines.

Thinking through this brought up some larger concerns. Comparing by rank alone is ridiculous. The space between each listing in both columns is extremely critical in a graphic like this and needs to be scaled appropriately. For instance, look at Alabama ($6.06) and Oklahoma ($6.07) in the right hand column. They basically have the exact same amount of spending per acre and yet they are the same distance apart as Washington ($9.86) and Minnesota ($11.37). The same problem happens in the lefthand column – states with about the same amount of acreage dedicated to farmland have the same distance between them as states with large differences in the amount of acreage they have dedicated to farmland.

Attempt 3

US Agricultural Commodities by State, 2010

US Agricultural Commodities by State, 2010

Click here to see a pdf of the whole graphic.

I scaled both the right and left hand columns using a log scale for farmland acreage (though the number of acres is still given in absolute millions of acres – only the visual arrangement was logged). The pattern is still messy and hard to discern, though clearer than in previous versions. In order to bolster the pattern, I turned the ‘good deals’ in the lefthand column pink. The states with less acreage dedicated to farmland routinely receive less subsidy per acre than some of the bigger states. But the very biggest farming states – like Montana and Texas – are also pretty affordable on a per acre basis. It was states near the middle of the pack that were coming in at $18 and $19 per acre of commodity subsidy spending.

I thought maybe it was a weather event that led to some of the larger subsidies. But if that were the case, states that were geographically near one another would probably have had the same drought/hurricane/flood and should have received similar funding. There is work to be done on the weather question – looking at data over time would be a good step in the right direction there.

However, I don’t know that weather is going to be the best answer to this question. Look at Washington and Oregon. They are geographically right next to each other, grow some similar kinds of things, and have a similar amount of farmland acreage yet they have dramatically different amounts of subsidy spending per acre. Washington takes $9.86 per acre; Oregon gets $2.51 per acre. It’s still unclear why there is such a great disparity between these two states in 2010.

Falsified hypothesis

Through the construction of this information graphic, I falsified my own hypothesis. The states with the smallest amount of land dedicated to farmland received the least amount of commodity subsidies.

I have some thoughts about what is going on. They will require more data analysis and graphic development to suss out and represent completely.

    New Hypotheses

  • 1. It’s the weather. It could still be the weather. I did not do enough investigation into this variable, though this seems like a weak hypothesis.
  • 2. It’s corn. The states that grow a lot of corn seem to get more subsidies. This hypothesis could easily be expanded to be something more sophisticated such as: “Subsidies per acre are sensitive to the commodity grown.”
  • 3. It’s lobbying. The states that are known to be “big farm” states seem to have more funding than smaller farm states. Maybe they are better represented by the farm lobbies and therefore end up with more subsidy per acre than states without strong representation from the farm lobby. This hypothesis has an overlap with the “it’s corn” hypothesis.

Conclusion

There are two kinds of conclusions to be drawn. On the agricultural front, it is safe to conclude that Americans spend a good bit of money per acre of farmland; there is no free market on the farms. Bigger states do not offer economies of scale compared to states with less farmland acreage. No additional conclusions can be drawn from this limited data, though interesting hypotheses can be posed about the influence of local weather events, funding for specific commodities like corn, and the impact of lobbyists efforts on agricultural funding allocations.

As a graphic exercise, I hope I have proven that rank orderings do not offer much analytical value on their own. I hope I have also suggested that graphics can be used not only for representing findings at the end of the process but for discovering patterns. Graphics are not just for display, they are also for discovery.

References

Neuman, William. (2011, 17 October) “When One Farm Subsidy Ends, Another May Rise to Replace it” Business Section, nytimes.com.

Noren, Laura. (2011) US Agricultural Commodity Subsidies by State, 2010“US Agricultural Commodity Subsidies by State, 2010″ [Information graphic] and [Data Table - this is a combination of data and analysis originally published by the National Agricultural Statistics Service, a public-facing branch of the USDA].

Environmental Working Group a good source for information on agricultural subsidy spending.

United States Census Bureau, Statistical Abstract of the United States (2011) Agriculture.

United States Department of Agriculture, National Agriculture Statistics Service