Author Archives: Laura Norén

US Hispanic Population, 2010

us-hispanic-population

What works

The Hispanic population is the fastest growing minority ethnic group in America. In the previous post about Race and Ethnicity in America, I showed the overall racial and ethnic proportions in America (2010 data). The graphic here specifically looks at what we mean when we say Hispanic in America. The predominant country of origin for Hispanic Americans is Mexico, accounting for almost two-thirds of the Hispanic population (63%). The Mexican American population continues to grow; Mexico is a much more populous place than, say, Puerto Rico, Cuba, or the Dominican Republic which is one explanation for the disparity in locations of origin. However, because Puerto Rico is part of the United States, it is the next largest source of Hispanic Americans at 9.2% followed closely by Hispanics from Central American countries at 7.9%.

What needs work

Admittedly, the graphic is nothing special just a stacked bar. I’m sharing it because it seemed miserly of me withhold it since it offers a better understanding of the ethnic make-up of America than the previous graphic alone. I probably should have posted it in the previous post, but it’s too late for that now.

References

Ennis, Sharon, Merarys Rios-Varga, and Nora Albert. 2011. The Hispanic Population. Census Briefs 2010. US Census Bureau.

Race and ethnicity in America

race-and-ethnicity-in-america

What works

This graphic does a great job of depicting race and ethnicity as distinct concepts. The orange hash marks above the racial groupings indicate the proportion of people in the racial categories that are also Hispanic by ethnicity. I made this to correct the graphics that lump race and ethnicity together (and – bafflingly – they still add up to 100%).

Race and ethnicity are not the same. Race refers to differences between people that include physical differences like skin color, hair texture and the shape of eyelids though the physical characteristics that add up to a social decision to consider person A a member of racial group 1 can change over time. Irish and Italian people in America used to be considered separate racial groups, based in part on skin color distinctions that most Americans could no longer make. What does “swarthy” look like anyway?

Ethnicity – a closely related concept – refers to shared cultural traits like language, religion, beliefs, and foodways. Often, people who are in a racial group also share an ethnicity, but this certainly isn’t always true. American Indians are considered a racial group but there are hundreds and hundreds of distinct tribes in the US and their religions, beliefs, foodways, and languages vary from tribe to tribe. Hispanics in America often share common language(s) (Spanish and/or English) but they may not share the same race. At the moment, most Hispanics in America self-identify as white. I have often wondered if, when I’m 60, the ethnic boundaries currently describing Hispanic people will have faded away, much like the boundaries describing Italian and Irish folks faded away, becoming more of a symbolic ethnicity that can become more important during the holidays and less important during day-to-day life.

What needs work

The elephant on the blog is that I have been on hiatus since February. I’m writing my dissertation and I plan to stay on hiatus through the spring to finish that. My decision may seem irresponsible from the perspective of regular readers and I apologize for my absence.

Close-up of graphic

Close-up of graphic

 

As for the graphic, it was designed to run along the bottom of a two-page spread so it does not work well here on the blog. If anyone wants a higher-resolution version to use in class or in a powerpoint, shoot me an email and I’ll send it.

References

US Census, 2012 using 2010 data.

Pacific Gyre overview

Through the Gyre by Jacob McGraw-Mickelson via GOOD Transparency

Through the Gyre by Jacob McGraw-Mickelson via GOOD Transparency

Information graphics and Illustrations

Information graphics generally do not include significant elements of illustration. It is even more rare that they are dominated by illustrations the way “Through the Gyre” is. Jacob McGraw-Mickelson created the illustration – it’s his imagination of what the Pacific Gyre might be like, not an anatomical cross-section. Using an illustration in a place where we come to expect something schematic and, therefore, representative of reality could be a dangerous play-on-truth using images and conventional expectations to convince viewers of a truth they will never be able to confirm. The Pacific Gyre is nearly impossible to visualize because it is operating at competing scales. The pieces of plastic are teeny tiny but they cover a swath of ocean that’s about as big as Texas. Reports of its density at various depths are still being developed.

Because the gyre is so difficult to visualize McGraw-Mickelson’s illustration of it has an easy time standing in for reality. We have no other photographs or scientific diagrams (yet) that aim to give us a visual overview. The ease of convincing a viewership could be seen as a kind of deceit-with-images but I prefer to think of it as art in the service of environmentalism. It may not be ‘representative’ of reality or even provide a schematic for thinking through oceanographic relationships. But it does bring gravity and depth to the following factoids that were developed as more traditional information graphics around the main illustration of the gyre.

Location of Pacific Gyre - Zoom of "Through the Gyre" by Jacob McGraw-Mickelson and GOOD transparency

Location of Pacific Gyre – Zoom of “Through the Gyre” by Jacob McGraw-Mickelson and GOOD transparency

Make-up of plastic pieces in the Pacific Gyre - Zoom of "Through the Gyre" by Jacob McGraw-Mickelson

Make-up of plastic pieces in the Pacific Gyre – Zoom of “Through the Gyre” by Jacob McGraw-Mickelson

Impact of the Pacific Gyre - Zoom of "Through the Gyre" by Jacob McGraw-Mickelson and GOOD transparency

Impact of the Pacific Gyre – Zoom of “Through the Gyre” by Jacob McGraw-Mickelson and GOOD transparency

What Works

This graphic gave me a whole new way to think through problems related to representing important concepts and ideas that do not have clear schematics, photos, or graphics but can inspire deep reflection. I bought a copy of the print of just the ‘gyre’ to remind myself to be cautious about my embrace of the American lifestyle. I could end up eating that plastic bag again someday as it makes it way into the food cycle.

References

McGraw-Mickelson, Jacob. (2009) “Through the Gyre” [illustration] featured in 2009′s best information graphics at GOOD Transparency blog

Can an outer space graphic inform small group network graphs?

50 years of space Exploration

50 years of space Exploration | by Sean McNaughton and Samuel Velasco

50 years of space exploration, zoom-in

50 years of space exploration, zoom-in | by Sean McNaughton and Samuel Velasco

Why space exploration is like a small-group network graph

This blog is supposed to be about social data and while there are certainly social components to space exploration, that’s not the angle I am going to discuss here. [See Alex Madrigal's piece in The Atlantic Moondoggle: The forgotten opposition to the space program" to get a taste of the sociopolitical forces behind the American space program.] Rather, what excited me about this graphic was the form and it’s potential application to relatively small network visualizations. Here’ what I’m thinking: say you have small work groups (like, for instance, in my dissertation) and you would like to visualize some kind of behavior or linkage pattern in that network. You might also like to have the power hierarchy in the visualization – and this would be the structural hierarchy that exists in relation to, but not as a cause of, the pattern of linkages and/or traffic in the network. You could use a nest-y network map like this:

Clear, well-visualized network graph

Clear, well-visualized network graph

OR…the formal standards in the space exploration graphic could be modified to suit network traffic, assuming a network with a small number of nodes. The planets could be people and they could be scaled and positioned to reflect their structural hierarchy. The edges – which in the space graphic are the trips – could be meetings or emails or any other kind of linkage that is important in the network. In the case of meetings, some meetings last longer or are otherwise more consequential so the edge could be thicker or more saturated with color.

Lots of network analysis looks at big networks where the nest-y network graph visualization technique is a good fit. But networks with fewer nodes and edges in which we know something about the social structure of the arrangement end up losing some of that context when they are represented in the nest-y network graphs. Those graphs are designed to help identify patterns where researchers either do not know much about the patterns in the first place or want to find an unbiased way to test their assumptions about the patterns they will find. But with the networks I am studying, I have discovered social patterns through ethnographic methods that I would like to have represented in my graphs. This space exploration graphic looks a lot like my back-of-napkin sketches for small groups. Of course, it is far more polished and more well-integrated with the ‘site plan’ running along the bottom of the graphic that helps establish scale, much like the way architect’s include a thumbnail site plan on their blue prints to establish a context for the siting of the building that’s represented in much greater detail on the plan.

Coming attractions

Over the next week, I hope to have a better sketch of a small-group network informed by ethnographic research up on Graphic Sociology.

References

Graphic Designers
Sean McNaughton, National Geographic Staff, www.nationalgeographic.com
Samuel Velasco, 5W Infographics, www.5wgraphics.com [this website was under renovation at the date of this blog post]

Madrigal, Alex. (Sept. 2012) Moondoggle: The forgotten opposition to the space program”. The Atlantic.

Hat-tip to Adam Crowe and <a href="http://www.flickr.com/photos/adamcrowe/sets/72157622579426670/"his flickr account.

Vintage American infographics | Susan Schulten

Mapping the Nation by Susan Schulten

I reviewed Susan Schulten’s new book, Mapping the Nation: History and cartography in 19th Century America, for publicbooks.org but there were so many images (90%) that did not make it into that review I decided to write a post here, too. This blog tends to focus on contemporary graphics, but information graphics are not new and the historical context of infographic forms is fascinating, especially in light of research that examines the status of information graphics as the output of inscription devices (Latour and Woolgar, 1979). How did we end up with the selection of graphic forms we now have? In what way were these images originally used and by whom?

The images in Schulten’s book – and on her superb companion website – are mostly maps, but there are also a surprising number of information graphics. As Schulten writes, maps and mapping were both made possible because America became a country (and thus had a government that could be petitioned to support the expense of creating maps and provide a centralized repository in which maps could be collectively held and made available) and they made America an imaginable possibility. In short, the establishment of American government made mapping possible and the existence of national maps made America an imaginable possibility. Without being able to see not only the colonies, but also the rest of the North American continent, it would have been far more difficult to imagine and pursue westward expansion, for instance. The first chapters of the book provide a nice companion to Benedict Anderson’s “Imagined Communities” that focused on the role of newspapers and novels in creating a national imagination. Schulten is also interested in printed matter, but for her the big deal is mapping.

Maps as propaganda

If mapping in the immediate post-colonial and early frontier eras was exciting – and it was – it got even more exciting during the contentious lead-up to the Civil War. One of the maps I’m including here is propaganda for the abolition of slavery. I have included the whole map as well as a close-up, but I encourage you to click through to Schulten’s companion website where you will find high quality scans of all the maps that will give you far more detail than I am able to show here.

Antebellum Historical Geography Map of US

Antebellum Historical Geography Map of US | by John Smith

Close-up of antebellum historical geography map | John Smith

Close-up of antebellum historical geography map | John Smith

Propaganda is typically not something maps are used for now, at least not in the blatant fashion of the pre-Civil War years, but it is true that maps are depictions of political boundaries and, as such, are ripe for the delivery of political messages. [For a more recent example of US maps used in politically charged ways see modern artist Jasper Johns.]

What I found more intriguing were the maps that displayed their political messages almost invisibly using choropleth techniques. The choropleth technique is still extremely common today and relies on shading assigned to political divisions like state or county lines. Census tract boundaries can also be used. It’s debatable whether or not census tracts are political boundaries but they certainly are not boundaries based on natural features like streams or mountain ranges. Some of the first choropleths were developed to show more precise locations and densities of slave labor in an effort to discredit Southern claims that slavery covered the South like a blanket without which Southern economies would freeze.

Slavery map of US, 1861

Slavery map of US, 1861 | Edwin Hergsheimer

Slavery map of US, 1861 [closeup]

Slavery map of US, 1861 [closeup] | Edwin Hergsheimer

Another attempt at a similar political message – to display variation in slave holdings in order to prove that other economic models were viable and operant in the South during the 1850s – failed as a map but introduced an interesting graphical form. This Missouri map shows county boundaries within each of which there is a small graphic with the overall intent of providing:

A view of the numerical relation of slaves to agricultural wealth in Missouri, Showing in each county the number of slaves to every ten thousand dollars worth of farms and farming implements according to the US Census of 1850.

To interpret the map, then, keep in mind that counties with more dots rely more heavily on slave labor rather than mechanical labor. Of course, counties with few dots could either be utilizing human labor more efficiently, and thus have lower slave-to-machine ratios, or they could have had very little agricultural practice of any kind, slave or free. Because the graphic elements represent such an obscure, unfamiliar measure (slaves-to-machines), the map ought not to be considered a great success. But it is an excellent example of maps depicting thematic data without resorting to choropleths. We could use more of this boundary pushing map-graphic hybridity now

Missouri Slave Density map, 1850 | Edwin Leigh

Missouri Slave Density map, 1850 | Edwin Leigh

Disease mapping in America

With some chagrin, I admit Schulten’s book corrected an inaccurate belief of mine with respect to the use of maps in the detection of disease. I had erroneously thought that John Snow was the first person to use maps as a tool to detect the cause of disease when he pinpointed the cause of London’s cholera epidemic to a public water pump. He was not the first to use maps to discover disease. Americans in Baltimore, Boston, and New Orleans were mapping all sorts of potential causes of diseases like cholera including weather patterns, train routes, proximity to open water, and the eventual culprit, proximity to public water. Snow was the first to hone in on the cause, but he was not the first to use maps. Further, he was likely aware of American public health mapping efforts.

Cholera map of Boston, 1849 | Henry Williams

Cholera map of Boston, 1849 | Henry Williams

Cholera map of Boston, 1849 [closeup] | Henry Williams

Cholera map of Boston, 1849 [closeup] | Henry Williams

Bonus image

I am including one more image – not a map – to show just how fresh 19th century graphics were. This is a graphic that uses states as categories but breaks them out of the map form in order to present them as squares. It is easier to divide squares into percentages, which is just what Francis A. Walker did to show the types of church denominations present from one state to the next. It is easy to see why he avoided using a map – it would be difficult to divide the irregular shapes of states into precise percentages. Further, even if he could divide the irregular areas properly, if he then filled the areas with particular denominations, it would have appeared that the denominations were geographically tied to particular places within the states. His choice of squares as representations of the states is logical. From this graphic solution to his problem we end up with a visual technique for representing all sorts of information that is bound to related categories.

Church denominations in the US by state

Church denominations in the US by state | Francis A. Walker via Susan Schulten

References

Latour and Woolgar. (1979) Laboratory Life: The Social Construction of Scientific Facts. Beverly Hills: Sage.

Schulten, Susan. (2012) Mapping the Nation: History and cartography in 19th Century America. Chicago: University of Chicago Press. [see also Mapping the Nation website]

Norén, Laura. (2013) Mapping a young America Review of Susan Schulten’s “Mapping the Nation: History and cartography in 19th Century America. PublicBooks.org

The Functional Art by Alberto Cairo | book review

The functional art book cover

Cairo, Alberto. (2013) The Functional Art: An introduction to information graphics and visualization. Berkeley: New Riders, a division of Pearson.

Overview

A functional art is a book in divided into four parts, but really it is easier to understand as only two parts. The first part is a sustained and convincingly argument that information graphics and data visualizations are technologies, not art, and that there are good reasons to follow certain guiding principles when reading and designing them. It is written by Alberto Cairo, a professor of journalism at the University of Miami an information graphics journalist who has had the not always pleasant experience of trying to apply functional rules in organizational structures that occasionally prefer formal rules.

The second part of the book is a series of interviews with journalists, designers, and artists about graphics and the work required to make good ones. This part of the book is as much about the organizational culture of art and design and specifically of graphics desks in newsrooms as it is about graphic design processes. The process drawings are fantastic. I’ve included two of them here. The first by John Grimwade is multi-layered, full of color and dynamic vitality. These qualities were carried through into the final graphic but are often very difficult to build into computer-generated images. I wondered if the graphic would have been as dynamic if it had come from a less well-developed hand sketch (or no sketch at all).

The second is a set of photographs taken of a clay model by Juan Velasco and Fernando Baptista of National Geographic that was used to recreate an ancient dwelling place call Gobekli Tepe that was in what is now Turkey. Both of these examples lead me to the iceberg hypothesis of graphic design – the more the design that shows up in the newspaper or magazine is just the tip of an iceberg of research, development, and creative work, the more accurate and engaging it is likely to be.

As a sociologist I am accustomed to reading interviews and am fascinated by the convergence and divergence in the opinions represented. In this case, I especially appreciated that Cairo’s interview questions touched on the organizational structures and working arrangements, as did his own anecdotes throughout the book, to provide an understanding of the opportunities and constraints journalists and information graphic designers face. Their work is massively collaborative and the book works to reveal the bureaucratic structures that come to promote and impinge upon design processes and products.

There is a fifth part to the book, too, a DVD of Cairo presenting the material covered in the first three chapters of the book. I admit, I have rarely been a large fan of DVD inclusions. They are easy to lose, scratch and/or break. But assuming the DVD is intact and accessible, I never know when I ought to stop reading and start watching. And even if the book has annotations indicating that an obedient reader should stop reading and start watching the DVD, this assumes the reader is willing and able to put down the book and fire up the computer. The only time I can imagine using the DVD is as a teaching aid in class to give the students a break from having to listen to me all the time. Unfortunately, that is prohibited by Pearson.

Still, it is worth watching because Cairo has a great voice and he is able to discuss interactive content/design in a way that is not easy in the pages of the book. While some of the discussion repeats themes from the first part of the book, there are new examples from additional designers, including some who have been Cairo’s students, which might be of interest to people thinking of signing up for his online course.

What does this book do well?

"Brazilian population grows more in prisons" graphic

“Brazilian population grows more in prisons” by Alberto Cairo originally in Epoca magazine November 2010, reprinted in “The Functional Art” by Alberto Cairo in 2013.

The book does a great job of explaining the decision making behind graphic design. The sketches, process drawings, and recounts of the conversations that went on in editorial meetings gave important depth of context. The organizational culture and day-to-day expectations of the newsroom tend to encourage the use of templates and discourage exuberant creativity. Cairo explained that this Brazilian prison graphic that eventually won the Malofiel design award also won him a reprimand from his boss who proclaimed it to be “ugly”. In practice, conceptual distinctions between art and technologies for comprehension are made rigid by bureaucratic structures in which, “the infographics director is subordinate to the art director, who is usually a graphic designer,” and that this arrangement, “can lead to damaging misunderstandings.”

The more prominent argument follows from these peeks into the backstage of journalism. Infographics and visualizations are technologies, not illustrations. Cairo writes that:

The first and main goal of any graphic and visualization is to be a tool for your eyes and brain to perceive what lies beyond their natural reach….The form of a technological object must depend on the tasks it should help with….the form should be constrained by the functions of your presentation….the better defined the goals of an artifact, the narrower the variety of forms it can adopt.

One of the writing techniques that Cairo uses is summarizing his take-away points from previous paragraphs in quick lists of pointers or key questions. Cairo incorporated these quick lists gracefully into the writing style and I never felt like I was reading a textbook. Still, the quick lists make it easy to use the book as a reference. The index, bibliography and detailed table of contents add strength to the book as a reference source, too. Note to the publisher: I found it frustrating that the book did not include a list of figures, especially given the subject matter.

"Home and Factory Weaving in England, 1820-1880" graphic

“Home and Factory Weaving in England, 1820-1880″ Otto and Marie Neurath Isotype Collection, University of Reading as seen in The Functional Art by Alberto Cairo.

Diversity

One of the greatest strengths of this book is the diversity of sources from which Cairo draws his material. Yes, he uses graphics he has developed in many cases which is hugely valuable because he is able to provide insights into the development processes. However, he also draws from graphics old and new [see an old one he pulled out of an archive at the University of Reading about weaving in the industrial revolution], from magazines, newspapers, and the internet, made by freelancers, in-house designers, and students, and in languages other than English (some of which are translated, some of which impressively need little translation). My favorite graphic in the book was one I never would have come across that uses pieces of fruit to describe the surgical procedures used to achieve sexual reassignment.

“How sex change surgeries work.” by Renata Steffen, William Vieira, Alex Silva and Sergio Gwercman in Superinteressante magazine (Brazil). Part 1 of 2.

“How sex change surgeries work.” by Renata Steffen, William Vieira, Alex Silva and Sergio Gwercman in Superinteressante magazine (Brazil). Part 2 of 2.

This diversity serves as an example of the breadth of Cairo’s experience in the world of journalistic information graphics. It is also a testament to his real joy in the subject. Many authors of design books are happy to fill the pages with their own work. Cairo is surely talented enough to have done. Instead, he chose to showcase an incredible range of designers and styles. This diversity, combined with the accessibility of the writing, are cause enough to recommend this book for anyone who is curious about graphics and journalism, especially journalism students.

What doesn’t this book do well?

The most curious shortcoming – given the incredible diversity of designers, styles, countries, and publication types represented – is the scarcity of women designers. There are thirteen designers profiled in part IV of the book; only two are women. There were forty-seven graphics reprinted; five were designed by women. With respect to the reprints, Cairo is completely justified in reprinting his own work more often than the work of others because he knows how the design process unfolded in those cases. Since he is a man, this inflates the masculine contribution to the reprinted graphics category. Still, many of the graphics he worked on were collaborative efforts and his collaborators could have been women in a more ideal world. But mostly, they were men.

Because the information graphics world is relatively interdisciplinary and (so far as I know) has no specific professional organization whose membership includes a representative sample of practicing information graphics and data visualization professionals, it is hard to tell if the gendered pattern in Cairo’s book is due to some oversight on his part or the underlying gendered make-up of the industry or a combination of both. Even if the industry is dominated by men, it is important for people who write and edit textbooks to ensure that women are represented or they run the risk of sending the message that women may not be welcome or well-rewarded if they choose to pursue data visualization. That is unacceptable. The graphics world will lose out on half its talent pool and women might avoid careers that could have been satisfying and rewarding for them. Notably, the kinds of graphic design that require coding – like data visualization and interactive design – are better compensated than illustration and static design so it’s possible that women are being subtly nudged into the less well-compensated areas of graphic design along the line. It would have been nice if this textbook that is so diverse in so many other ways could have pushed the gender boundary and included more women.

The book also over-promises in the cognition section. The first chapter on cognition was too basic. The second and third chapters in this section had more that was directly applicable to design. All three chapters could have been condensed into one. It is certainly true that perception and cognition ought to be included and there were some useful applications derived from the three chapters, but there was too much review and too few clear applications of the basic principles of cognition and perception to graphic design.

Here are the pointers I did find useful, if you happen to want to buy the book and skip those chapters:

+ If you want viewers to estimate changes by visually comparing elements, you will have the best luck if those changes are depicted using elements of the smallest number of dimensions possible. For instance, viewers will have an easier time coming up with an accurate estimate of the difference in size between two lines (1D) than between two circles or squares (2D). It’s best to avoid 3D comparisons altogether. I would also add that regular objects like circles and squares are cognitively easier to think with than irregular objects like polygons other than squares.

+ The less frequently a color appears in nature, the more likely it is to draw the eye. Reserve the use of colors like red, pink, purple, orange, teal, and yellow for elements that are meant to draw attention.

+ Humans cannot focus on multiple elements at the same time. Design graphics that have one focal point or clear hierarchies of focal points. Do this by eliminating unnecessary use of bright color, chart junk like grid lines that aren’t absolutely necessary, and by establishing a logical information hierarchy in the page layout.

+ Landscapes have horizon lines. Humans are used to encountering the world this way. This is one reason why it is easier to make comparisons using bar graphs (where all the elements start from a common horizon line) rather than pie charts (where there is no shared horizon).

+ Eyes are good at detecting motion and they will focus attention on moving objects. Try not to ask viewers to read text and simultaneously watch a moving element in interactive graphics.

+ Human brains are good at picking out patterns. Often, fairly small changes to a graphic layout that strengthen the appearance of grouping or other types of patterns will add to the ability of the graphic to deliver an instant impression or overview of the message being communicated. For instance, changing the spacing of the bars in a bar graph so that every fourth bar has twice as much space after it as all the rest will make the graph appear to have groups of 4-bar units.

+ Interposition – placing one object in front of another so they overlap – is a good way to add depth. If objects never overlap, the opportunity for the illusion of depth is lost.

Summary

Overall, the book was well-written, included valuable insight into the process underlying the creation of strong, successful information graphics and visualizations, and would be a solid textbook for use in journalism departments. The representation of women designers was disappointingly low and the segment on cognition could be condensed or otherwise improved. Cairo is clearly a talented designer and teacher. This book meaningfully combines both of those strengths and is an important contribution to undergraduate and graduate education in the emerging sub-discipline of information visualization and design.

I am sending you out with one of the graphics I was most impressed by, in part because the graphic is good, but mostly because Cairo helped me to see why a rather average looking graphic is in fact rather brilliant. It is by Hannah Fairfield of the New York Times graphic desk and it shows that the driving behavior of Americans is sensitive to changes in the economy. During the 2005 recession when gas prices were high but the economy was struggling overall, Americans drove fewer miles. This pattern had only one historical precedent – the 1970s. The graphic depicts this by having a timeline that appears to walk backwards during those two periods in history, a broken pattern your pattern-loving mind is likely to fixate on once you realize this is not your average line graph. Smart.

References

Cairo, Alberto. (2013) The functional art: An introduction to information graphics and visualization.

Fairfield, Hannah. (2010) Driving Shifts into Reverse New York: New York Times.

Grimwade, John. (1996) The Transatlantic Superhighway. [information graphic]. New York: Conde Nast Traveler.

Steffen, Renata; Vieira, William; Silva, Alex and Gwercman, Sergio. “How sex change surgeries work.” Superinteressante magazine. Brazil.

Velasco, Juan and Fernando Baptista. () “Gobekli Tepe Process Shots”. National Geographic Magazine. In Cairo, Alberto (2013) The Functional Art p. 238.

Urban and rural housing vacancy rates

What works

The “Ghost Counties” interactive visualization by Jan Willem Tulp that I review in this post won the Eyeo Festival at the Walker Art Center last year. The challenge set forth by the Eyeo Festival committee in 2011 (for the Festival happening in 2012) was to use Census 2010 data to create a visualization using Census data that did not rely on maps…or if it did rely on maps, it had to use maps in a highly innovative way. This is an excellent design program – maps are over-used. Yet it’s one thing to assert that maps are over-used and another thing to produce an innovative graphic representation that is not a map.

Tulp does a great job of leaving the map behind. He also does a phenomenal job of incorporating a large dataset (8 Mb of data serve the images in the interactive graphic from which the stills in this post were captured). The graphic has a snappy response time once it has loaded and his work makes a solid case for the beautiful union of large data and clear representation thereof.

The color scheme is great and reveals itself without a key. Those counties with low vacancy are teal, those sort of in the middle are grey-green, and those with high vacancy are maroon. The background is light, but not white. White would have been too stark – like an anesthetized space. He experimented with darker backgrounds (see his other options at his flickr stream here) but those ended up presenting an outer space feel. The background color he settled on was (and is) the best choice. Background colors set the tone for the entire graphic, along with the font color, and Tulp’s work is positive evidence of the value of carefully considering them.

Pie charts might be better than circles-in-circles

The dot within a dot is difficult for the eye to measure. Pie charts- which I only recommend if there are very few wedges – would have worked well with this type of data because there are only two wedges (see here for an example of a two wedged pie chart). I just finished reading Alberto Cairo’s important new book The functional art and he had a solid critique of the circle-in-circle approach that helped me realize what’s so appealing, but just plain wrong, about circles-in-circles:

“Bubbles are misleading. They make you underestimate difference….If the bubbles have no functional purpose, why not design a simple and honest table? Because circles look good. (emphasis in original)”

In this case, a wedge in a pie chart could have represented the percent of total housing units occupied.

Why is it so hard to ‘see’ rural vs. urban?

The x-axis is a log scale for population size. It’s clear from what we know about the general trend towards urbanization that we would expect urban areas to have lower vacancy rates than rural areas. Even in 1990 – two census surveys before the 2010 data that was used here – the New York Times ran a story about the population decline in rural America and there has been widespread coverage of the trend towards urbanization by both journalists and academics (the LSE Cities program does nice work).

The two states shown here – New York and Minnesota – both have some big cities and a whole of small cities in rural areas. Some small cities are also in suburban areas. That’s a problem with this visualization, the distinctions that have been established in academic literature between rural, suburban, ex-urban, and urban are difficult to pick out of this visual scheme. While it would be difficult to find a sociologist who could wrangle the data to produce this kind of visualization, I imagine many of my intellectual kin would be confused by this visual scheme and demand to return to a map-based graphic because at least in that case they could see patterns associated with the rural-urban spectrum the old-fashioned way. I am not wedded to the notion that a map is the only way to “see” the rural-urban spectrum, but the current configuration makes it difficult to think with the existing literature about housing patterns even though the attempt to distinguish between population size was built into the graphic on the x-axis. Population size is not always a great proxy for urban vs. rural, so it is a weak operationalization of spatial concepts social scientists have found to be meaningful. For instance, a small, exclusive ex-urban area filled with wealthy folks and their swimming pools is conceptually much different from a small, depopulating rural town even if they have roughly similar population sizes.

It is important in a research community to build on good existing work and reveal the weaknesses of existing work where it’s falling short. Either way, it is a bad idea to ignore existing work. Where a project does not relate to existing work – neither building momentum in a positive direction nor steering intellectual growth away from blind alleys – it will likely become an orphan. In this case, the project is only an orphan with respect to urban scholarship. As a computational challenge, it most definitely advanced the field of web-based interactive visualization of large datasets. As a visual representation, it adhered to a design aesthetic that I would like to see more of in academic work. But as a sociological analysis, it’s nearly impossible to ‘see’ clearly or with new eyes any of the existing questions around housing patterns. It is also my opinion – and this is far more easily contested – that it does not raise new important questions about housing patterns in urban, suburban, or rural America either.

My critique here is not that all data visualization is pretty but useless and that we should stick to our maps because they tie us to our existing disciplines and silos of knowledge. Rather, my critique is that in order for data visualization to become a useful tool in the analytical and communication toolkits of social scientists, the work of social science is going to have to find a way into the data visualization community. As anyone who has tried to use Census data knows, looking at piles of data is not synonymous with analysis. While Tulp’s graphics certainly present an analysis, that analysis seems to have turned its back on a fairly sizable swath of journalism on urbanization, not to mention the hefty body of academic work on the same set of topics.

Graphic Sociology exists in part to find a way to keep social scientists motivated to produce higher quality infographics and data visualizations than what is currently standard in our field. But the blog is equally good for sharing a social scientific perspective with computer scientists and designers who are ahead of us with respect to the visual analysis and display of social data. There is a way to bring the strengths of these fields together in a meaningful, positive way. We are not there yet.

References

Cairo, Albert. (2013) “The Functional Art: An introduction to information graphics and visualization.” Berkeley: New Riders.

Eyeo Festival.

Tulp, Jan Willem. (2011) “Ghost Counties” [Interactive Visualization] Submitted to Eyeo Festival and selected the winner in 2012.

Visualizing email traffic

Editing process in graphic design

The editing process in graphic design is somewhat different than the editing process in writing. Writers tend to start with a skeleton, make sure the bones are all in the right places, and then slowly add and sculpt musculature and skin through iterative processes. Graphic designers start with a whole bunch of skeletons, subtract a few, add musculature to the rest, subtract a few of those, add skin to the remaining ones, and then only late in the process will a single design go through a final polishing process.

One of the ways social scientists teach students to become skeptical about the things they read is by teaching them how to edit their own work and the work of others. Students start to see how pieces of written work represent a series of choices. They see that what they’ve read could have gone in other conceptual directions, used different evidence, been shortened, lengthened, stripped of jargon, or otherwise constructed and styled in new ways that could have changed the meanings taken away by the readers. Learning to construct, critique, and polish writing is a major part of how readers develop the tools they need to understand and analyze the works they read.

There is far less educational time spent teaching students how to create visual work, especially visual work outside of the realm of personal expression (I feel like most arts programs emphasize personal expression which is different than creating visual work with the intent of displaying data or even political messaging). It is not surprising that we end up with a bunch of people who struggle to apply an analytic lens to information graphics. This leads to a communications power imbalance that privileges certain kinds of visual devices, including information graphics, over writing inasmuch as information graphics are more likely to be accepted without too much scrutiny since most folks do not have a good idea where to begin to scrutinize them. Information graphics combine the moral authority of numbers with the cognitive inertia of sight that lies behind the cliche that ‘seeing is believing’.

In the service of pulling back the curtain on graphic design, I thought it might be useful to save an entire series of drafts in the development process of a graphic that describes the email traffic in a small design work group. The purpose is to break the seal around the image and reveal it is a series of decisions that might easily have been otherwise.

First Draft

First, I thought a stem and leaf diagram might work.

Stem and Leaf diagrams of office email traffic

Stem and Leaf diagrams of office email traffic

But these graphics failed because there was no way to keep strings of receiving or sending visually united. If the people in the office happened to be sending (or receiving) a series of email that spanned between one ten-minute period and the next ten-minute period, that run would be visually broken. I also wasn’t thrilled with the way the sent email matched up with the received email. It was hard to see that when one person in the office sent an email, it would often land in the inbox of someone else in the office.

Still, I liked the version where I turned the numbers into balls and that idea came back in a different form later in the development process.

Second Draft

I decided to abandon the stem and leaf for a timeline. I initially imagined triangles as markers for the email because I thought the shape would indicate the directionality of an email going out into the internet.

email traffic timeline, version 1

This version has an entire day on one page, morning sits above afternoon.

And I tried some different color schemes.

Email traffic timeline, version 1.1

Email traffic timeline, version 1.1 stretching the day across two pages.

Email traffic timeline, version 1.2

Email traffic timeline, version 1.2

The triangles did not work and some of the color schemes created a sense of vibration. A trained graphic designer might have tried the triangles (and rejected them, of course), but they would not have made the mistakes with color that I did.

Third draft

I replotted the graphic with circles, not triangles, and added up all the emails that were received in 5-minute periods instead of plotting each individually. This lost a bit of granularity, but it made it easier to see where traffic was greatest because it allowed the height of the circles start to draw the eye.

Email timeline, version 1.3

Email timeline, version 1.3
There is another page to the right of this one but viewing the image at this scale displays more detail.

This version is much closer to the final but something was missing.

Fourth draft

I started to realize that the timelines were difficult to analyze so I went back to the data and pulled out some summary statistics about the average number of emails each person sent and received. I also thought it would be interesting to see how much of the officewide traffic each person generated. While I was looking for new ways to help people understand what they were looking at, I also showed them the range of reality in the same timeline format by pulling out the lines for the highest traffic person-day and the lowest traffic person-day. I also remembered one of the lessons I learned from reading Nathan Yau’s Visualize This and added some descriptive text. [A full review of that book is here.]

Office email traffic

Office email traffic

This is as far as I have gotten. But if I get good suggestions in the comments, I’ll keep improving.

What can writers learn from graphic designers

Getting through this many drafts alone was hard. It is very hard to see the same thing with new eyes. I got some help from two different people and even though neither of them said much, their opinions made a huge difference in the process. I encourage writers to find a way to share their work with others earlier in the process. It is humbling. If the comparison to graphic design is apt, earlier sharing either of the whole draft or of smaller sections will also likely lead to a stronger piece that gets written faster.

Stem and leaf diagrams

Stem and leaf diagram becomes a histogram

What works

The stem and leaf diagram is an old stand-by that has largely been abandoned in social science as it morphed into the histogram. It is a rather ingenious graphical device that could be created even with a typewriter, which is how people used to prepare documents not that long ago. And when I say ‘people’ used to prepare documents, I am actually imagining wives and girlfriends of the husbands and boyfriends who were preparing final drafts of their dissertations and later the (mostly female) secretaries, administrators, and lab assistants typing up articles and figures for (mostly male) professors. [Refer to this graphic on the gendered nature of degrees at the doctoral level for supporting evidence that it was mostly men writing dissertations and then getting the jobs available to people who had written dissertations.]

How to make a stem and leaf diagram

1. Start with numerical data. Organize it from least to greatest.

2. Think of each number as having a stem and a leaf. The stem is the more durable part of the number and the leaf is the more sensitive part of the number. For a number like 57, the more durable part of the number is the ’5′ because even if there was some variation in the measure, the number in the 10′s spot might not change but the ’7′ in the singles spot is more sensitive and thus more likely to flutter like a leaf. If we were measuring temperature, for instance, it would be a lot more likely that the day would have temperatures like 56 and 58 than 60-something and 40-something. Thus, the tens spot is the stem and the singles spot is the leaf in this case. It would be possible to use measurements in the hundreds or even thousands.

3. Once you have identified your stems and leaves, type the lowest stem value. Then type a bar or some other vertical device to separate your stem from your leaves. Then look at all the observations you have for that stem value. Type in every single observed leaf value for that stem, starting with the lowest one. So if you are creating a diagram of all the temperatures registered at noon for the month of November, you will have 30 values to stick in your chart. You will probably have something like three values in the 30s – say, 35, 37, and 38. This would mean you would type a 3, then a vertical bar, then 5, 7, and 8. If there were also nine values in the 40s – say 40, 41, 42, 42, 43, 45, 45, 46, and 48 you would hit carriage return. Then you’d type a 4, a vertical bar, and 0 1 2 2 3 5 5 6 8. You see how people (mostly women) could use typewriters to make graphics.

The strength of this technique is that it forces the actual dataset into a visually organized diagram. All of the values can be read right out of the graph but the device as a whole gives an impression of the overall pattern.

4. At some point after typewriters, the stem and leaf diagram morphed into a histogram. I think Excel had something to do with this, but I am still researching just how it was that the stem and leaf diagram was relegated to the dustbin while the histogram rose to take its place.

Worth thinking about

Stem and leaf diagrams are close cousins of bar charts and histograms. While bar charts and histograms might be more attractive in some ways, they are, in fact, less data-rich. It is not possible to read the actual values out of a colored bar. Despite the fact that the histogram chart form *could* be more visually pleasing than the stem and leaf diagram the fact that histograms allow more space for aesthetics means that they can just as easily be uglier, not more appealing, than stem and leaf diagrams. Dumb and ugly is no good at all. Still, bar charts gave rise to things like stacked bar charts that allow us to visualize observations for multiple investigations that share the same variables so I do not consider them a step backwards.

What about global body mass index?

The information in the graphs above comes from the World Health Organization’s database of global body mass index. The numbers represent the percentage of people in the overweight or obese range of the body mass index in individual countries, NOT the average body mass index of individual countries. Notice that one country [American Samoa] has over 90% of its adult population in the overweight or obese range. If you’re curious, the US has 66.9% of our adults in the overweight+obese range. Vietnam is on the low end with only 5% of its adults overweight or obese.

References

World Health Organization Global Database on Body-Mass Index. [Last accessed 17 November 2012]

Visualize This | book review

Hot Dog Eating Contest Graph

Hot Dog Eating Contest Graph – Large version

Preface to the book review series

There are two ideal types of infographics books. One ideal type is the how-to manual, a guide that explains which tools to use and what to do with them (for more on ideal types, see Max Weber). The other ideal type is the critical analysis of information graphics as a particular type of visual communications device that relies on a shared, though often tacit, set of encoding and decoding devices. The book reviews I proposed to write for Graphic Sociology include some of each kind of book, though they lean more towards the how-to manuals simply because more of that type have come out lately. As with all ideal types, none of the books will wholly how-to or wholly critical analysis.

I meant to review two of Edward Tufte’s books first so that we would start off with a good grounding in the analytical tools that would help us figure out which parts of the how-to manuals were likely to lead to graphics that do not commit various information visualization sins. However, I have spent the past six weeks at a field site (a graphic design studio nonetheless) and it rapidly became completely impractical to lug the two oversized, hard cover Tufte books around with me. I found Nathan Yau’s paperback “Visualize This” to be much more portable so it skipped to the head of the line and will be the first review in the series.

The Tufte review is next up.

Review of Visualize This by Nathan Yau

Visualize This book cover

Yau, Nathan. (2011) Visualize This: The FlowingData guide to data, visualization, and statistics Indianapolis: Wiley.

Visualize This is a how-to data visualization manual written by statistician Nathan Yau who is also the author of the popular data visualization blog flowingdata.com. The book does not repeat the blog’s greatest hits or otherwise revisit much familiar territory. Rather, this was Yau’s first attempt to offer his readers (and others) a process for building a toolkit for visualizing data. The field of data visualization is not centralized in any kind of way that I have been able to discern and Yau’s book is a great way to build fundamental skills in visualization that use tools spanning a range of fields.

The three primary tools that Yau introduces in the book are two programming languages – R and python – and the Adobe Illustrator design software. Both R and Python are free and supported by a bevy of programmers in the open source world. R is a programming package developed for statistics. Python has a much broader appeal. Both of them can produce data visualizations. Adobe Illustrator is neither free nor open source but it is worth the investment if you are planning to do just about any kind of graphic design whatsoever, including data visualizations. Yau mentions free alternatives, and there are some, but none have all of the features Illustrator has.

Much of the book starts readers off building the basic bones of a visualization in R or python, based on a comma-separated value data file that has already been compiled for us by Yau. He notes that getting the data structured properly often takes up more than half the time he spends on a graphic, but the book does not dwell much on the tedium of cleaning up messy data sources. Fine by me. One of the first examples in the book is a graphic built and explored in R, then tidied up and annotated in Illustrator using data from Nathan’s Hot Dog Eating contest.

This process is repeated throughout:
   1. start visualizing data with programming;
   2. try to find patterns with programming;
   3. tidy up and annotate output from program in Illustrator.

The panel below shows you what R can do with just a few lines of code. Hopefully, it also becomes clear why it is necessary to take the output from R into Illustrator before making it public.

Visualize This - example from chapter 4

Visualize This – example from chapter 4

Great tips

There are hints and tips sprinkled throughout the book covering everything from where to find the best datasets to how to convert them into something manageable to how to resize circles to get them to accurately represent scale changes. This last tip is one of my favorites. When we visualize data and use circles of varying sizes to represent the size of populations (or some other numerical value) what we are looking at is the area of the circle. When we want to represent a population that is twice as big as the size of some other population, we need to resize the circle so its area is twice as big, not its circumference.

How to scale circles for data visualization

How to scale circles for data visualization

More great tips:
1. First, love the data. Next, visualize the data.*
2. Always cite your data sources. Go ahead and give yourself some credit, too.
3. Label your axes and include a legend.
4. Annotate your graphics with a sentence or two to frame and/or bolster the narrative.

*Love the data means take an interest in the stories the data can tell, get comfortable with the relationships in the data, and clean up any goofs in the dataset.

Pastry graphics: Pie and donut charts

Yau’s advice about pie charts diverges from mine. I say: use them only when you have four or fewer wedges because human eyes really have trouble comparing the area of one wedge to another wedge, especially when they do not share a common axis. Yau acknowledges my stubborn avoidance of pie charts but advises a slightly different attitude:

Pie charts have developed a stigma for not being as accurate as bar charts or position-based visuals, so some think you should avoid them completely. It’s easier to judge length than it is to judge areas and angles. That doesn’t mean you have to completely avoid them though. You can use the pie chart without any problems just as long as you know its limitations. It’s simple. Keep your data organized, and don’t put too many wedges in one pie.

The Yau explains how to visualize the responses to a survey he distributed to his own readers at FlowingData to see what they’d say they were most interested in reading about. He showed the readers of the book a table with the blog readers’ responses which I’ve recreated below [Option A]. I think the data is easier to read in the table than in either the pie chart or the closely related donut chart [Option(s) B]. In life as in visualization, a steady stream of pies and donuts is fun but dumb. Use sparingly.

Visualize This example from chapter 5

Visualize This example from chapter 5

Interactive graphics

Learning about pie charts was great fun even though I don’t like pie charts because Yau taught us how to use protovis, a javascript library that yields interactive graphics. We built a pie chart just like the one(s) in Option B that popped up values on mouseover the wedges. Protovis was developed at Stanford and has now morphed into the d3.js library. The packages developed in Protovis are still stable and usable. I highly recommend this exercise for anyone who wants to make infographics for the web. It helps to have a basic understanding of html going in.

What needs work

The overarching problem I had with Visualize This is that it spent relatively little time generating different types of graphics using the same data. We saw a little bit of that above when Yau used both a pie chart and a donut chart to visualize the same survey responses, but since donut charts are just variations on pie charts, it was not the best example in the book. The best example came when Yau visualized the age structure of the American population from 1860 – 2005 (I updated the end date to 2010 since I had access to 2010 census data).

First, Yau shows readers how to make this lovely stacked area graph in Illustrator. That’s right. No R. No Python. Just Illustrator.

Aging Americans

Aging Americans | Stacked area graph version

Then Yau admits that the stacked area chart has some general limitations:

One of the drawbacks to using stacked area charts is that they become hard to read and practically useless when you have a lot of categories and data points. The chart type worked for age breakdowns because there were only five categories. Start adding more, and the layers start to look like thin strips. Likewise, if you have one category that has relatively small counts, it can easily get dwarfed by the more prominent categories.

I tend to disagree that the stacked area chart ‘worked’ for displaying the age structure of the US population, but not because there were too many categories. I’ll get to why I don’t think the stacked area graph worked shortly, but first, let’s have a look at the same data represented in a line graph. This was Yau’s idea, and it was a good one. What we can see by looking at the data in a line graph rather than a stacked graph is the size ordering of these age slices. Yeah, I can kind of see that the 20-44 group was the biggest group in the stacked graph. But I had to think about it. In the line graph, I don’t wonder for a second which group was biggest. The 20-44 group is on top. The axes in line graphs just make more sense. I admit that the line graph is not an aesthetic marvel the way the area graph was. But, you know, you can figure out your own priorities. If you want pretty, go with the area graph and get smart about colors (with the wrong color scheme, any graphic can look awful. See also: what Excel generates automatically). If you want a graphic for thinking with, avoid stacked area graphs.

Aging Americans

Aging Americans | Line graph version

Coming back to what I think about visualizing the age structure of the American population. Call me old-fashioned, say that I adore my elders too much, I’ll just tell you we all stand on the backs of geniuses. I like the age pyramids for visualizing the age structure of a population. Here’s one I plucked from the Census website.

Population Aging in the United States | Traditional age pyramid graphic

The pyramid has these advantages:
   1. It shows gender differences. Males are on the left. Females are on the right.
   2. This graphic does a better job of showing the structure of the population because the older people appear to balance on the younger people. This is useful because the older people actually do kind of balance on the younger people when it comes to things like Social Security. The structure of the population does not come through in the area graph or the line graph. Both of those show us that there are more old people now than there were before but displaying more is a less sophisticated visual message than showing us just how many older people and how much older and how these things have changed over time. See all those and’s in the previous sentence? Yeah. That’s how much better the pyramid is.
   3. It is possible to see both the forest and the trees in this age pyramid. What do I mean? Well, the stacked area graph and the line graph had to lump rather large (and disproportionately sized) groups of ages together. In the age pyramid, the slices are even at every five years and if you happen to want to figure out just how the 20-24 year olds are changing over time, you can. But this granularity does not make it difficult to understand the overall structure of the pyramid.

To summarize my larger disappointment, I wish that Yau had gone through a number of examples of displaying the same data with different graphics in order to teach readers how to choose the best graphic. To his credit, he did visualize crime data with a bunch of different graphics, but I didn’t like any of the graphic types. I’m including the one I liked most, but it’s mostly for historical reasons. This type of weird fanned out pie wedges is called a Nightingale chart and was developed in part by Florence Nightingale way back when information graphics didn’t exist. He visualized this same crime data with Chernoff faces and with star graphics, neither of which were interpretable, in my opinion.

US Crime Rates by State - Nightingale charts

US Crime Rates by State – Nightingale charts

Heatmaps

Unlike Chernoff faces, star charts, and Nightingale charts which I think are totally useless, heatmaps have promise as data visualizations. This is a good example of how I wished Yau would have started working hard to get the data to lash up better with the visualization. This is his final version of the heatmap of a whole bunch of different basketball game statistics with the players who were responsible for scoring, assisting, and rebounding (among many other things). I am a basketball fan. I went linsane last season. But I just do not get excited when I look at this heatmap because the visualization does not reveal any patterns. Ask yourself: would I rather have this information in a table? If the answer is yes, well, then you know there’s at least one other kind of representation besides this one that you would prefer if this is the data you are trying to display.

NBA heatmap via FlowingData

NBA heatmap via FlowingData

So what would I do? Well, I’d do a couple things. First, I would probably try restricting this heatmap to the top ten players or even to my favorite players. Throwing in 50 players and about 20 statistics per player without condensing anything means we are looking at 1000 data points. Ooof. So…if not cutting down the number of players, maybe put the scoring statistics in a different heatmap than all the other statistics (playtime, games played, rebounds, steals, blocks, turnovers, and so on). Maybe strip out the “attempts” and just leave the completed free throws, field goals, and three-pointers. I do not know if these things would have revealed patterns, I just know that the current graphic is still looking like a data soup to me.

Maps triumphant

Overall, this was a great how-to for data visualization and I want to end on an appropriately high note. One of the biggest wins in the book was Chapter 8 in which Yau walks us through the most meticulous and involved demo in the book. The payoff is big. He shows us how to use google maps and FIPS codes to make choropleths (these are large maps in which colors mated with numerical values fill in small, politically bounded units, usually counties but sometimes census tracts). He does not use ArcGIS which is one of the reigning mapping tools on the market. But ArcGIS is expensive. And Yau shows us how to generate maps without spending a dime. You will have to spend some time. If you are a cartography geek or you follow the unemployment rate, you’ve probably already seen this graphic because it was widely circulated, for good reason.

Unemployment map via FlowingData

Unemployment map via FlowingData