Tag Archives: methods/use of data

Race, Gender, and Skin Color Dynamism in Comic Books

Lots of time and care consideration goes into the production of new superheroes and the revision of time-honored heroes. Subtle features of outfits aren’t changed by accident and don’t go unnoticed. Skin color also merits careful consideration to ensure that the racial depiction of characters is consistent with their back stories alongside other considerations. A colleague of mine recently shared an interesting analysis of racial depictions by a comic artist, Ronald Wimberly—“Lighten Up.”

“Lighten Up” is a cartoon essay that addresses some of the issues Wimberly struggled with in drawing for a major comic book publisher. NPR ran a story on the essay as well. In short, Wimberly was asked by his editor to “lighten” a characters’ skin tone — a character who is supposed to have a Mexican father and an African American mother.  The essay is about Wimberly’s struggle with the request and his attempt to make sense of how the potentially innocuous-seeming request might be connected with racial inequality.

In the panel of the cartoon reproduced here, you can see Wimberly’s original color swatch for the character alongside the swatch he was instructed to use for the character.

Digitally, colors are handled by what computer programmers refer to as hexadecimal IDs. Every color has a hexademical “color code.” It’s an alphanumeric string of 6 letters and/or numbers preceded by the pound symbol (#).  For example, computers are able to understand the color white with the color code #FFFFFF and the color black with #000000. Hexadecimal IDs are based on binary digits—they’re basically a way of turning colors into code so that computers can understand them. Artists might tell you that there are an infinite number of possibilities for different colors. But on a computer, color combinations are not infinite: there are exactly 16,777,216 possible color combinations. Hexadecimal IDs are an interesting bit of data and I’m not familiar with many social scientists making use of them (but see).

There’s probably more than one way of using color codes as data. But one thought I had was that they could be an interesting way of identifying racialized depictions of comic book characters in a reproducible manner—borrowing from Wimberly’s idea in “Lighten Up.” Some questions might be:

  • Are white characters depicted with the same hexadecimal variation as non-white characters?
  • Or, are women depicted with more or less hexadecimal variation than men?
  • Perhaps white characters are more likely to be depicted in more dramatic and dynamic lighting, causing their skin to be depicted with more variation than non-white characters.

If any of this is true, it might also make an interesting data-based argument to suggest that white characters are featured in more dynamic ways in comic books than are non-white characters. The same could be true of men compared with women.

Just to give this a try, I downloaded a free eye-dropper plug-in that identifies hexadecimal IDs. I used the top 16 images in a Google Image search for Batman (white man), Amazing-man (black man), and Wonder Woman (white woman). Because many images alter skin tone with shadows and light, I tried to use the eye-dropper to select the pixel that appeared most representative of the skin tone of the face of each character depicted.

Here are the images for Batman with a clean swatch of the hexadecimal IDs for the skin tone associated with each image below:

2 (1)

Below are the images for Amazing-man with swatches of the skin tone color codes beneath:


Finally, here are the images for Wonder Woman with pure samples of the color codes associated with her skin tone for each image below:


Now, perhaps it was unfair to use Batman as a comparison as his character is more often depicted at night than is Wonder Woman—a fact which might mean he is more often depicted in dynamic lighting than she is. But it’s an interesting thought experiment.  Based on this sample, two things that seem immediately apparent:

  • Amazing-man is depicted much darker when his character is drawn angry.
  • And Wonder Woman exhibits the least color variation of the three.

Whether this is representative is beyond the scope of the post.  But, it’s an interesting question.  While we know that there are dramatically fewer women in comic books than men, inequality is not only a matter of numbers.  Portrayal matters a great deal as well, and color codes might be one way of considering getting at this issue in a new and systematic way.

While the hexadecimal ID of an individual pixel of an image is an objective measure of color, it’s also true that color is in the eye of the beholder and we perceive colors differently when they are situated alongside different colors. So, obviously, color alone tells us little about individual perception, and even less about the social and cultural meaning systems tied to different hexadecimal hues. Yet, as Wimberly writes,

In art, this is very important. Art is where associations are made. Art is where we form the narratives of our identity.

Beyond this, art is a powerful cultural arena in which we form narratives about the identities of others.

At any rate, it’s an interesting idea. And I hope someone smarter than me does something with it (or tells me that it’s already been done and I simply wasn’t aware).

Originally posted at Feminist Reflections and Inequality by Interior Design. Cross-posted at Pacific Standard. H/t to Andrea Herrera.

Tristan Bridges is a sociologist of gender and sexuality at the College at Brockport (SUNY).  Dr. Bridges blogs about some of this research and more at Inequality by (Interior) Design.  You can follow him on twitter @tristanbphd.

Using OK Cupid to Teach Research Methods

We’ve highlighted the really interesting research coming out of the dating site OK Cupid before. It’s great stuff and worth exploring:

All of those posts offer neat lessons about research methods, too. And so does the video below of co-founder Christian Rudder talking about how they’ve collected and used the data. It might be fun to show in research methods classes because it raises some interesting questions like: What are different kinds of social science data? How can/should we manipulate respondents to get it? What does it look like? How can it be used to answer questions? Or, how can we understand the important difference between having the data and doing an interpretation of it? That is, the data-don’t-speak-for-themselves issue.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

The Iron Cage in Binary Code: How Facebook Shapes Your Life Chances

There was a great article in The Nation last week about social media and ad hoc credit scoring. Can Facebook assign you a score you don’t know about but that determines your life chances?

Traditional credit scores like your FICO or your Beacon score can determine your life chances. By life chances, we generally mean how much mobility you will have. Here, we mean a number created by third party companies often determines you can buy a house/car, how much house/car you can buy, how expensive buying a house/car will be for you. It can mean your parents not qualifying to co-sign a student loan for you to pay for college. These are modern iterations of life chances and credit scores are part of it.

It does not seem like Facebook is issuing a score, or a number, of your creditworthiness per se. Instead they are limiting which financial vehicles and services are offered to you in ads based on assessments of your creditworthiness.

One of the authors of The Nation piece (disclosure: a friend), Astra Taylor, points out how her Facebook ads changed when she started using Facebook to communicate with student protestors from for-profit colleges. I saw the same shift when I did a study of non-traditional students on Facebook.

You get ads like this one from DeVry:

2 (1)

Although, I suspect my ads were always a little different based on my peer and family relations. Those relations are majority black. In the U.S. context that means it is likely that my social network has a lower wealth and/or status position as read through the cumulative historical impact of race on things like where we work, what jobs we have, what schools we go to, etc. But even with that, after doing my study, I got every for-profit college and “fix your student loan debt” financing scheme ad known to man.

Whether or not I know these ads are scams is entirely up to my individual cultural capital. Basically, do I know better? And if I do know better, how do I come to know it?

I happen to know better because I have an advanced education, peers with advanced educations and I read broadly. All of those are also a function of wealth and status. I won’t draw out the causal diagram I’ve got brewing in my mind but basically it would say something like, “you need wealth and status to get advantageous services offered you on the social media that overlays our social world and you need proximity wealth and status to know when those services are advantageous or not”.

It is in interesting twist on how credit scoring shapes life chances. And it runs right through social media and how a “personalized” platform can never be democratizing when the platform operates in a society defined by inequalities.

I would think of three articles/papers in conversation if I were to teach this (hint, I probably will). Healy and Fourcade on how credit scoring in a financialized social system shapes life chances is a start:

providers have learned to tailor their products in specific ways in an effort to maximize rents, transforming the sources and forms of inequality in the process.

And then Astra Taylor and Jathan Sadowski’s piece in The Nation as a nice accessible complement to that scholarly article:

Making things even more muddled, the boundary between traditional credit scoring and marketing has blurred. The big credit bureaus have long had sidelines selling marketing lists, but now various companies, including credit bureaus, create and sell “consumer evaluation,” “buying power,” and “marketing” scores, which are ingeniously devised to evade the FCRA (a 2011 presentation by FICO and Equifax’s IXI Services was titled “Enhancing Your Marketing Effectiveness and Decisions With Non-Regulated Data”). The algorithms behind these scores are designed to predict spending and whether prospective customers will be moneymakers or money-losers. Proponents claim that the scores simply facilitate advertising, and that they’re not used to approve individuals for credit offers or any other action that would trigger the FCRA. This leaves those of us who are scored with no rights or recourse.

And then there was Quinn Norton this week on The Message talking about her experiences as one of those marketers Taylor and Sadowski allude to. Norton’s piece summarizes nicely how difficult it is to opt-out of being tracked, measured and sold for profit when we use the Internet:

I could build a dossier on you. You would have a unique identifier, linked to demographically interesting facts about you that I could pull up individually or en masse. Even when you changed your ID or your name, I would still have you, based on traces and behaviors that remained the same — the same computer, the same face, the same writing style, something would give it away and I could relink you. Anonymous data is shockingly easy to de-anonymize. I would still be building a map of you. Correlating with other databases, credit card information (which has been on sale for decades, by the way), public records, voter information, a thousand little databases you never knew you were in, I could create a picture of your life so complete I would know you better than your family does, or perhaps even than you know yourself.

It is the iron cage in binary code. Not only is our social life rationalized in ways even Weber could not have imagined but it is also coded into systems in ways difficult to resist, legislate or exert political power.

Gaye Tuchman and I talk about this full rationalization in a recent paper on rationalized higher education. At our level of analysis, we can see how measurement regimes not only work at the individual level but reshape entire institutions. Of recent changes to higher education (most notably Wisconsin removing tenure from state statute causing alarm about the role of faculty in public higher education) we argue that:

In short, the for-profit college’s organizational innovation lies not in its growth but in its fully rationalized educational structure, the likes of which being touted in some form as efficiency solutions to traditional colleges who have only adopted these rationalized processes piecemeal.

And just like that we were back to the for-profit colleges that prompted Taylor and Sadowski’s article in The Nation.

Efficiencies. Ads. Credit scores. Life chances. States. Institutions. People. Inequality.

And that is how I read. All of these pieces are woven together and its a kind of (sad) fun when we can see how. Contemporary inequalities run through rationalized systems that are being perfected on social media (because its how we social), given form through institutions, and made invisible in the little bites of data we use for critical minutiae that the Internet has made it difficult to do without.

Tressie McMillan Cottom is an assistant professor of sociology at Virginia Commonwealth University.  Her doctoral research is a comparative study of the expansion of for-profit colleges.  You can follow her on twitter and at her blog, where this post originally appeared.

Just for Fun: Is Truncating the Y-Axis Dishonest?

What do you think?


Thanks to @WyoWeeds!

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Just for Fun: But… Can One Be Too Meta?

3Thanks xkcd.

The Patriots: Out-and-Out Liars? Or Just Outliers?

I’m not saying that the Patriots are out-and-out liars. But they are outliers.

The advantage of an underinflated ball, like the eleven of the twelve footballs the Patriots used last Sunday, is that it’s easier to grip. Ball carriers will be less likely fumble if they’re gripping a ball they can sink their fingers into.

We can’t go back and measure the pressure of balls the Patriots were using before the Colts game, but Warren Sharp (here) went back and dug up the data on fumbles for all NFL games since 2010.  Since a team that controls the ball and runs more plays has more chances to fumble, Sharp graphed the ratio of plays to fumbles (values in red squares in the chart below) along with the absolute number of fumbles (values in blue circles). The higher the ratio, the less fumble-prone the team was.


One of these things is not like the others.  That’s what an outlier is. It’s off the charts. It’s nowhere near the trend line. Something about it is very different. The variables that might explain the differences among the other data points – better players, better weather or a domed stadium, a pass-centered offense – don’t apply. Something else is going on.

As the graph shows, when the teams are rank ordered on the plays/fumbles ratio, the difference between one team and the next higher is usually 0-2, there are only two gaps of 5 until the 9-point gap between #3 Atlanta and #2 Houston. From the second-best Texans and to the Patriots there’s a 47-point jump.

Sharp also graphed the data as a histogram.

1 (4)

It’s pretty much a bell curve centered around the mean of 105 plays-per-fumble. Except for that outlier. And the chart shows just how far out it lies.

The Patriots play in a cold-weather climate in a stadium exposed to the elements.  Yet their plays/fumble ratio is 50% higher than that of the Packers, 80% higher than the Bears. They have good players, but those players fumble less often for the Patriots than they did when they played for other NFL teams.

Usually, the statistical anomaly comes first – someone notices that US healthcare costs are double those of other nations – and then people try to come up with explanations.  In this case, it wasn’t until we had a possible explanatory variable that researchers went back and found the outlier. As Peter Sagal of “Wait, Wait, Don’t Tell Me” said, “The League became suspicious when a Patriots player scored a touchdown and instead of spiking the ball he just folded it and put it in his pocket.”

UPDATE, Jan. 28: Since I posted this, there has been some discussion of Sharp’s data (“discussion” is a euphemism – this is sports and the Internet after all). If you’re really interested in pursuing this, try Advanced Football Analytics  or this piece  at Deadspin “Why Those Statistics About The Patriots’ Fumbles Are Mostly Junk,” (to repeat, “discussion” is a euphemism, and if you more strongly voiced views, read the comments). One of the difficulties I suspect is that a fumble is a rare event. The difference between the teams with the surest grip and the most butterfingered is about one fumble every couple of games.

Cross-posted at Montclair SocioBlog.

Jay Livingston is the chair of the Sociology Department at Montclair State University. You can follow him at Montclair SocioBlog or on Twitter.

Just for Fun: The Folly of Two Data Points

Every year, at the first faculty meeting, representatives of the registrar tell us what percentage of the incoming class is [insert variable in which we are interested, such as American Indian, working class, international, etc].  They compare it to last year’s percentage.  This drives me crazy because they do so as if comparing the last two data points in a sequence is indicative of a trend. But to determine whether or not there is a trend, and therefore whether the increase or decrease in the percentage of [insert variable in which we are interested] significant relative to last year, depends on more than two data points!

xkcd does an excellent job of illustrating just how two data points can be utterly meaningless, even wildly fallacious:


Other great xkcd cartoons: attribution and the in group, on statistical significance, correlation or causation, and the minimal group paradigm.

Originally posted in 2009.

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.

Just For Fun: The Trouble with p<.05

In statistics, a little star next to a coefficient generally means that the result is statistically significant at the p<.05 level. In English, this means that there is only a 1 in 20 chance that the finding just popped up by pure random chance. In sociology, that’s generally considered good enough to conclude that the finding is “real.”

If one investigates a lot of relationships, however, this way of deciding which ones to claim as real has an obvious pitfall.  If you look at 20 possible but false relationships, chances are that one of them will be statistically significant by chance alone. Do enough fishing in a dead lake, in other words, and you’ll inevitably pull up some garbage.

Thanks xkcd, for making this funny:

1 (3)

Lisa Wade is a professor of sociology at Occidental College and the co-author of Gender: Ideas, Interactions, Institutions. You can follow her on Twitter and Facebook.