methods/use of data: big data

The recent controversial arrests at a Philadelphia Starbucks, where a manager called the police on two Black men who had only been in the store a few minutes, are an important reminder that bias in the American criminal justice system creates both large scale, dramatic disparities and little, everyday inequalities. Research shows that common misdemeanors are a big part of this, because fines and fees can pile up on people who are more likely to be policed for small infractions.

A great example is the common traffic ticket. Some drivers who get pulled over get a ticket, while others get let off with a warning. Does that discretion shake out differently depending on the driver’s race? The Stanford Open Policing Project has collected data on over 60 million traffic stops, and a working paper from the project finds that Black and Hispanic drivers are more likely to be ticketed or searched at a stop than white drivers.

To see some of these patterns in a quick exercise, we pulled the project’s data on over four million stop records from Illinois and over eight million records from South Carolina. These charts are only a first look—we split the recorded outcomes of stops across the different codes for driver race available in the data and didn’t control for additional factors. However, they give a troubling basic picture about who gets a ticket and who drives away with a warning.

(Click to Enlarge)
(Click to Enlarge)

These charts show more dramatic disparities in South Carolina, but a larger proportion of white drivers who were stopped got off with warnings (and fewer got tickets) in Illinois as well. In fact, with millions of observations in each data set, differences of even a few percentage points can represent hundreds, even thousands of drivers. Think about how much revenue those tickets bring in, and who has to pay them. In the criminal justice system, the little things can add up quickly.

More social scientists are pointing out that the computer algorithms that run so much of our lives have our human, social biases baked in. This has serious consequences for determining who gets credit, who gets parole, and all kinds of other important life opportunities.

It also has some sillier consequences.

Last week NPR host Sam Sanders tweeted about his Spotify recommendations:

Others quickly chimed in with screenshots of their own. Here are some of my mixes:

The program has clearly learned to suggest music based on established listening patterns and norms from music genres. Sociologists know that music tastes are a way we build communities and signal our identities to others, and the music industry reinforces these boundaries in their marketing, especially along racial lines.

These patterns highlight a core sociological point that social boundaries large and small emerge from our behavior even when nobody is trying to exclude anyone. Algorithms accelerate this process by the sheer number of interactions they can watch at any given time. It is important to remembers the stakes of these design quirks when talking about new technology. After all, if biased results come out, the program probably learned it from watching us!

Evan Stewart is an assistant professor of sociology at University of Massachusetts Boston. You can follow his work at his website, or on BlueSky.

One of the big themes in social theory is rationalization—the idea that people use institutions, routines, and other formal systems to make social interaction more efficient, but also less flexible and spontaneous. Max Weber famously wrote about bureaucracy, especially how even the most charismatic or creative individuals would eventually come to rely on stable routines. More recent works highlight just how much depend on rationalization in government, at work, and in pop culture.

With new tools in data analysis, we can see rationalization at work. Integrative Biology professor Claus Wilke (@ClausWilke) recently looked at a database of movies from IMDB since 1920. His figure (called a joyplot) lets us compare distributions of movie run times and see how they have changed over the years.

While older films had much more variation in length, we can see a clear pattern emerge where most movies now come in just shy of 100 minutes, and a small portion of short films stick to under 30. The mass market movie routine has clearly come to dominate as more films stick to a common structure.

What’s most interesting to me is not just these two peaks, but that we can also see the disappearance and return of short films between 1980 and 2010 and the some smoothing of the main distribution after 2000. Weber thought that new charismatic ideas could arise to challenge the rationalized status quo, even if those ideas would eventually become routines themselves. With the rise of online distribution for independent films, we may be in the middle of a new wave in charismatic cinema.

Evan Stewart is an assistant professor of sociology at University of Massachusetts Boston. You can follow his work at his website, or on BlueSky.

Originally posted at Scatterplot.

There has been a lot of great discussion, research, and reporting on the promise and pitfalls of algorithmic decisionmaking in the past few years. As Cathy O’Neil nicely shows in her Weapons of Math Destruction (and associated columns), algorithmic decisionmaking has become increasingly important in domains as diverse as credit, insurance, education, and criminal justice. The algorithms O’Neil studies are characterized by their opacity, their scale, and their capacity to damage.

Much of the public debate has focused on a class of algorithms employed in criminal justice, especially in sentencing and parole decisions. As scholars like Bernard Harcourt and Jonathan Simon have noted, criminal justice has been a testing ground for algorithmic decisionmaking since the early 20th century. But most of these early efforts had limited reach (low scale), and they were often published in scholarly venues (low opacity). Modern algorithms are proprietary, and are increasingly employed to decide the sentences or parole decisions for entire states.

“Code of Silence,” Rebecca Wexler’s new piece in Washington Monthlyexplores one such influential algorithm: COMPAS (also the study of an extensive, if contested, ProPublica report). Like O’Neil, Wexler focuses on the problem of opacity. The COMPAS algorithm is owned by a for-profit company, Northpointe, and the details of the algorithm are protected by trade secret law. The problems here are both obvious and massive, as Wexler documents.

Beyond the issue of secrecy, though, one issue struck me in reading Wexler’s account. One of the main justifications for a tool like COMPAS is that it reduces subjectivity in decisionmaking. The problems here are real: we know that decisionmakers at every point in the criminal justice system treat white and black individuals differently, from who gets stopped and frisked to who receives the death penalty. Complex, secretive algorithms like COMPAS are supposed to help solve this problem by turning the process of making consequential decisions into a mechanically objective one – no subjectivity, no bias.

But as Wexler’s reporting shows, some of the variables that COMPAS considers (and apparently considers quite strongly) are just as subjective as the process it was designed to replace. Questions like:

Based on the screener’s observations, is this person a suspected or admitted gang member?

In your neighborhood, have some of your friends or family been crime victims?

How often do you have barely enough money to get by?

Wexler reports on the case of Glenn Rodríguez, a model inmate who was denied parole on the basis of his puzzlingly high COMPAS score:

Glenn Rodríguez had managed to work around this problem and show not only the presence of the error, but also its significance. He had been in prison so long, he later explained to me, that he knew inmates with similar backgrounds who were willing to let him see their COMPAS results. “This one guy, everything was the same except question 19,” he said. “I thought, this one answer is changing everything for me.” Then another inmate with a “yes” for that question was reassessed, and the single input switched to “no.” His final score dropped on a ten-point scale from 8 to 1. This was no red herring.

So what is question 19? The New York State version of COMPAS uses two separate inputs to evaluate prison misconduct. One is the inmate’s official disciplinary record. The other is question 19, which asks the evaluator, “Does this person appear to have notable disciplinary issues?”

Advocates of predictive models for criminal justice use often argue that computer systems can be more objective and transparent than human decisionmakers. But New York’s use of COMPAS for parole decisions shows that the opposite is also possible. An inmate’s disciplinary record can reflect past biases in the prison’s procedures, as when guards single out certain inmates or racial groups for harsh treatment. And question 19 explicitly asks for an evaluator’s opinion. The system can actually end up compounding and obscuring subjectivity.

This story was all too familiar to me from Emily Bosk’s work on similar decisionmaking systems in the child welfare system where case workers must answer similarly subjective questions about parental behaviors and problems in order to produce a seemingly objective score used to make decisions about removing children from home in cases of abuse and neglect. A statistical scoring system that takes subjective inputs (and it’s hard to imagine one that doesn’t) can’t produce a perfectly objective decision. To put it differently: this sort of algorithmic decisionmaking replaces your biases with someone else’s biases.

Dan Hirschman is a professor of sociology at Brown University. He writes for scatterplot and is an editor of the ASA blog Work in Progress. You can follow him on Twitter.