Lanius

In December, the Center for Data Innovation sent out an email titled, “New in Data: Big Data’s Positive Impact on Underserved Communities is Finally Getting the Attention it Deserves” containing an article by Joshua New. New recounts the remarks of Federal Trade Commissioner Terrell McSweeny at a Google Policy Forum in Washington, D.C. on data for social empowerment, which proudly lists examples of data-doing-good. As I read through the examples provided of big data “serving the underserved”, I was first confused and then frustrated. Though to be fair, I went into it a little annoyed by the title itself.

The idea of big data for social good is not “new in data”: big data have been in the news and a major concern for research communities around the world since 2008. One of the primary justifications, whether spoken or implicit, is that data will solve all of humanities biggest crises. Big data do not “deserve” attention: they are composed of inanimate objects without needs or emotions. And, while big data are having an “impact” on underserved communities, it is certainly not the unshakably positive, utopian impact that the title promises.

Big data are a complex of technologies designed, implemented, and controlled by those in power to measure and observe everyone else. Period. Big data are not serving the underserved; they serve the elites who design the systems. Where the goals of those controlling the system align with the underserved, then sure, big data do good. But when the goals and needs of stakeholders are in competition, the people’s needs are quickly forgotten. For something to “serve” a community, the community must have input into what the project or technology is and have at least partial control over its implementation. These are two easy heuristics: who has input and control? Effective and just policy solutions do not come from isolated conversations inside the beltway, from the ivory tower, or Silicon Valley incubators; solutions are built from working one-on-one with communities in need. Take the well-known example of mass government surveillance systems. In the United States, the National Security Agency justified surveillance programs as necessary for “the common good”, to protect the rights and freedoms of US citizens. Yet, the surveillance programs and the collection of more and more data have trumped individual privacy rights countless times. The American people have no input or control over the system, so it does not serve them.

Despite my dubious reading of the email title, I clicked onward to the substance of the article, wherein my concerns were substantiated. The examples provided to support McSweeney’s claims that big data serve the underserved completely miss the mark. These evidentiary cases, which I analyze below, leave “the underserved community” ill defined, never clarifying if the community is isolated rural families, poor individuals, minorities, or the elderly, and represent projects that operate outside of community input or buy-in.

  1. In California, smart meters are “enabling local authorities to enforce restrictions on water use” among homeowners. Analysis: Boosting law enforcement capability to monitor and fine homeowners over their water usage does not serve underserved communities. Additionally, the monitoring of individual families ignores the fact that the vast majority of California’s water is used by agriculture and industry.

 

  1. The Federal Government is using census data to target services, such as “providing prospective college students and their families information that can lead to better financial planning decisions.” Analysis: Targeted marketing of student loan and grant information serves the needs of lenders more than future students, and this project fails to address the rising costs of higher education; instead, it perpetuates student debt. Additionally, completing the census is a legal requirement for US residents, and they have no input into how that data is eventually used for both public services and private marketing.

 

  1. The Federal Trade Commission hosted a hackathon (a competition to build a software solution in a short period of time) to solve the “problem of automated telemarketing calls”, with the winning solution stopping “36 million unwanted robocalls.” Analysis: Stopping robocalls does not serve the underserved; it uses FTC resources to solve a first-world annoyance. This project raises the issue of who gets to decide what issues are worth pursuing. Rather than crowd sourcing a serious public health or safety issue, the FTC focused their energy on a minor inconvenience.

While the examples listed in this talk are Government programs, it is interesting to note the cozy relationship between private, for-profit industry partners and public decision makers. McSweeney admits this profit motivation early on in her talk, saying “Data is the fuel that powers much of our technological progress, and provides innovators with the raw materials they need to make better apps, better services, and better products.” The relationship between private and public partners is so cozy, in fact, that each example of government programs serving communities makes an ideological conflation. It assumes that “serving underserved communities” occurs when the government uses data to save money. Making a profit is not an inherently bad thing, but it can come into direct conflict with the needs of underserved communities. For one, these communities frequently need specialized services that are not generalizable, making them cost-prohibitive from a market standpoint: yet those costly services should still be delivered. For example, medical care for those suffering rare diseases or translation services for non-English speakers in public schools are expensive yet necessary services. A government for the people should be motivated beyond market concerns.

The tone taken in the body of the Center for Data Innovation article is not surprising. As a think tank (read: lobbyists) located in Washington, D.C., the Center’s purpose is “capitalizing” on the “enormous social and economic benefits” of big data by supporting “public policies designed to allow data-driven innovation to flourish”. Policies that allow innovation to flourish, frequently without scrutiny or oversight, is PR speak for get out of our way and let us do what we want. The Center’s goal to reduce barriers to data innovation means a reduction in oversight and consumer protections.

The article concludes, “It was heartening to hear such an esteemed figure in Washington policy community so clearly articulate how data is empowering individuals.” Yet none of the examples empower individuals! No underserved communities had a voice or any control over the implementation of the data system. As an individual in a privileged position, I cannot and should not make claims to know what a perfect big data project for an underserved community is, and that is the point. I am not in a position to know the specific needs of underserved communities. Instead of assuming I or anyone else in a position of power knows better, “big data for social good” projects should go out and actually ask their target community what their biggest concerns are and involve them in the creation of projects which respond to those needs. Agency is a fundamental condition of fulfilling an individual’s needs, and until a big data project asks for input and gives control to the people it claims to serve, then it is failing them.

Candice Lanius is a PhD Candidate in the Department of Communication and Media at Rensselaer Polytechnic Institute who gets annoyed every time she hears someone say “The data speaks for itself!”

Header image source: Jeremy Keith


Screen Shot 2015-11-04 at 1.35.18 PM

This year I was able to attend the Grace Hopper Celebration of Women in Computing for the second time. As only a casual programmer, I am an odd attendee, but the event supports a cause I care deeply about: getting and retaining more women in technology and engineering roles. GHC is an exhilarating mixture of famous keynote speakers, girl power workshops, tech demonstrations, and a “swag” filled career fair. On Day 1 I was definitely into it:

https://twitter.com/Misclanius/status/654294038439292929

But this year, unlike last year, the shiny newness of GHC had worn off a bit, and I started to notice a few things that bothered me. As a mainstream conference, GHC is made palatable to the widest possible audience of women, men, and businesses (because in America, they count as people too!). Being palatable means the conference doesn’t critically engage with many important issues and is therefore open to a variety of critiques. For one, its feminism is of the Lean In© variety, so it doesn’t really engage with the intersection of race, class, and gender [PDF] in tech companies. GHC also supports an, at times, obscene and gratuitous display of wealth, with “The Big Three” of Silicon Valley (Google, Facebook, and Apple) competing to outspend each other putting on the biggest recruitment show. Maybe I have the start of a series of posts on my hands…

Today, however, I want to talk about ethics. That is, I want to talk about how we talk about ethics at tech conferences. First: Yes, we should be talking about ethical issues at technology conferences. As Cyborgology’s writers have explored numerous times [1] [2] [3] [4], technology has politics and therefore we need to talk about ethics—a system which guides actors in right and wrong conduct—to improve the outcomes of technological innovation. So why not have that conversation in the belly of the beast, at a tech conference—a space that is designed to showcase new developments and recruit new workers.

https://twitter.com/Misclanius/status/654307489307979776

How do we actually have a conversation about ethics? Before answering, let me share the ways the conversation is currently going at GHC:

1. Speakers ignore ethics or politics completely.

I would share examples of speakers ignoring ethics and politics completely, but there are word limits to these essays.

2. Speakers joke about ethics, then bracket that conversation for another time or place.

On Thursday afternoon, Dr. Danelle Shah from MIT’s Lincoln Laboratory gave a talk on “Follow, #Hashtag and @Mention: Mining Social Media for Disaster Response.” To begin her presentation, Dr. Shah joked that there are serious implications for her research, but she wouldn’t “touch the policy and ethical implications with a 10 foot pole”. As her audience of several hundred women laughed, the pit of my stomach fell. Essentially, Dr. Shah is taking social media data in cases where the user chose to use an alias and/or chose not to disclose their geographic location, and then her team is running an algorithm that infers the user’s identity and the geographic location of their home “with 80% accuracy within 10 meters”. Let me repeat that: Users have intentionally chosen to not use their IRL names or provide their geolocation to Twitter, and this team has created a tool that generates that information despite a clearly stated preference by the individual. The user has no control over their information. But, as Dr. Shah explained, the tool is created for disaster response; Previously, “we hoped that enough people have GPS and tagging on to be able to see a signal”, but now a much larger section of the data is useful, so “we can rescue people at their home.” I left the session wanting to discuss: Should there be a right, when actively choosing not to share information, to not have that information inferred by others? Should there be an individual right to control the actions taken on your personal data?

The presentation, despite claiming to bracket the ethics and politics for another time and place, has implicit references to ethical and political issues throughout. For one, this research is funded by the Department of Defense through a DARPA grant, and military funding is highly political. Dr. Shah’s conclusion also makes a claim for consequentialist ethics, that “the ends justify the means”, a hotly debated system for determining right from wrong action. Yet, by joking about ethics at the start of the presentation, Dr. Shah has primed the audience to ignore and accept these positions without question. In short, there is no such thing as having the conversation about ethics and politics at another time and place because ethics and politics are always present.

3. Speakers agree to have the conversation, but it starts from scratch and with no forethought.

On Wednesday, I was very excited to attend a Birds of a Feather session on Ethics and Morality in Virtual and Augmented Reality led by Katherine Harris and Olivia Erickson, two “Developer Evangelists of Microsoft”. Because it was a BoF session, Harris and Erickson’s roles were “to open up the discussion of what sort of ethical and moral questions will need to be discussed as virtual and augmented reality enters the consumer arena at a large scale.” While there were several interesting points raised by audience members, the vast majority of responses suffered from two problems. First, many answers were without historical knowledge of the contested nature of current regulatory structures. These answers presumed that if an action is currently legal, it is necessarily an ethical action. The second major limitation was that many participants had simply never thought about it before, so they were visibly and verbally forming opinions on the spot. “Well”, “I guess”, “Maybe”, “Hmmm”, etc. These two combine to create a dangerous tabula rasa approach to ethical issues where existing practices get recreated in perpetuity without critical reflection. Ethics are an area for rigorous inquiry that must incorporate historical knowledge and debate.

So, how should we talk about ethics at a tech conference?

I have a few ideas for how we might actually have a conversation about ethics, but it falls on two different groups to make these critical conversations happen. For one, there is too much “siloing” between fields of specialization in academia and the private sector. Ethicists and humanists need to attend these professional and public meetings to facilitate and support conversations about ethics and technological innovation.

For those individuals who are already attending tech conferences, they need to be aware of ethical issues, talk about them, and recognize that ethics don’t just emerge, perfectly formed from a single individual’s sense of right and wrong. Ethical systems are historically situated and should be debated by the community.

If you are interested in discussing these issues, I serve as a co-chair of the Ethics and Social Aspects of Data Interest Group of the Research Data Alliance, formed in late 2014. We started with the basics, like when, where, and who should be talking about ethics around data and data sharing (Everyone!), and we are now working on a community built, annotated bibliography of previous work and resources on the ethics of data. These topics and issues are also covered by danah boyd’s research institute Data & Society. The NSF’s Council for Big Data, Ethics, and Society is also building an expanded network of individuals interested in the ethical and social issues that emerge from a changing data landscape.

Candice Lanius is a PhD Student in the Department of Communication and Media at Rensselaer Polytechnic Institute who still gets annoyed every time she hears someone say “The data speaks for itself!” (A statement which should technically be “The data speak for themselves”, but my advisor isn’t reading this, so I can use the colloquial version if I want to!)

The tools of my self-disciplining.
The tools of my self-disciplining.

The quantified self (QS) movement advertises itself as a way for individuals interested in tracking their daily lives to use sensors and computing technology to monitor their activities, whether those activities involve biological processes or social actions, to better understand the their habits and improve upon them. The tracking and use of personal data through proprietary sensing and software platforms is generally accepted as part of the benign “datification” of everyday life. These services span almost every activity, from making grocery shopping more efficient (Grocery IQ) to monitoring levels of physical activity (Fitbit). Many authors have made insightful criticisms and observations about the contemporary datification landscape as a system. Notably, Frank Pasquale, in The Black Box Society, writes about the increase of commercialization and the sale of users’ data, their “digital reputation,” in the opaque world of the data-as-insight industrial complex. This is a valuable systemic critique, yet I am more interested in the personal effects of self-quantification. I argue that the use of self-monitoring and tracking technologies can create anxiety around the data capturing process. Tracking technologies create an ordering of people and experiences that discourages moments which are not easily quantified.

I personally began to consistently collect data on my daily activities during 2014 as part of a summer internship. I was downloading and testing digital applications to see what degree of functionality was common to the most popular applications on the market. While most of the apps were for media consumption, I quickly became fixated on the self-tracking and self-improvement category. Whether the app was a task planner, exercise log, or nutritional diary, I was excited to see trends in my behavior beyond the one or two week period that my own memory served to recount and contextualize my activities. Initially, the experience was great! I used my Soleus GPS watch for running and hiking, MyFitnessPal for nutrition and calories, MyMinutes for regulating the time I dedicated to a list of ongoing projects throughout the day. After the novelty wore off, however, I found self-tracking and self-monitoring created restrictions on my life.

The most obvious problem was the amount of time—sometimes a few seconds, but in other instances, up to 20 minutes—I spent preparing and actually capturing my data. For the GPS watch to work, I had to acquire a signal before I could begin a run, and I don’t even want to begin recounting the runs where I lost the satellite signal midway. For the dietary tracker, there were thousands of pre-entered product entries, but I tend to buy on-sale or off brand items (graduate student on a budget here), which meant I spent copious amounts of time inputting every nutritional detail. Additionally, for any food items that were not clearly proportioned (serving sizes described solely in ounces), I felt compelled to purchase a food scale and measure ingredients before assembling them for my meal. The effort input into tracking “work sprints” and overall productivity cost me precious minutes of break time. While the time spent recording my data grew to be annoying, wasted time was not the only or most insidious effect self-tracking applications had on me.

The worst part was the anxiety that I felt every time I encountered instances of “poor” data, such as missed or incomplete information, and activities which are not easily recorded or even quantifiable. Of course the manufacturers and developers do not discuss what happens when their product is not operating smoothly, yet even if the application is working 100% as designed, it exerts force on the daily habits of the user in unexpected ways. Those forces can create a great deal of anxiety, leading the individual to be less spontaneous and avoid unknown or unquantifiable situations. My GPS watch has problems maintaining a signal, which meant I slowly stopped running in forests or on pathways that strayed from roadways. For the nutrition tracking app, I had to politely smile and decline my friend’s homemade brownies, even with my stomach growling, because I had no way to know what she put in them. The productivity tracker was the first app I stopped using (almost immediately) because an application that purports to help with work/life balance should not interfere with break time. It must be noted here that I took SARMs rather than steroids which made me feel more comfortable then steroids itself. You can read the full guide on SARMs here.

To better understand the source of my anxiety, I first asked: Where does the urge to quantify my behavior come from? I turned to Foucault. Foucault’s Discipline and Punish traces the historical development of the carceral system which uses public institutions to monitor and control the activities of citizens. While this historical trajectory begins with discipline in the prison system, the urge to control and monitor envelopes even the non-criminal activities of daily life, such as public education and health. These outside forces guide citizens to become ideal subjects, and these forces are slowly internalized so that individuals want to become the best version of themselves as approved by the state; we want to score well on standardized tests; we want to perform well for state mandated physical fitness tests. This continues today with the integration of our own personal technologies of discipline (self-tracking and self-monitoring applications) which track, observe, and ultimately change our bodies and behaviors. Technologies of discipline go hand-in-hand with the transition towards data driven assessment and objectivity that trumps anecdotes or impressions of the individual. Experts and expertise embedded in technology are positioned as authorities over the individual’s body. Therefore, my anxiety is a result of not being able to reliably capture my data while simultaneously feeling compelled to do so. Without reliable and complete measurements, I cannot become the ideal version of myself.

By looking at the urge to self-quantify using a Foucauldian lens, it opens up other avenues for understanding why individuals broadcast their personal information online. The desire to become an ideal subject is why self-monitoring applications and “personal progress” are not kept private. Many individuals record their behaviors and activities and then share them with others using social media. A common criticism of sharing is that the practice is a form of narcissism, yet that claim offers a simplistic understanding of the urge to share metrics about ourselves. The ideal subject has always been a public subject. Deviancy is what hides in the shadows. Sharing is part of the need to reveal the body for public scrutiny to prove that the individual is a good person.

Candice Lanius is a PhD student in the Department of Communication and Media at Rensselaer Polytechnic Institute who gets annoyed every time she hears someone say “The data speaks for itself!”

Website: https://clanius.wordpress.com/

Twitter: @Misclanius

Today we’re reposting our most popular guest post of the year. This essay has garnered a lot of attention and for good reason: it speaks directly to a kind of liberal racism that is endemic to the institutions and professions that see themselves as the good guys in this problem. -db

Police_in_riot_gear_at_Ferguson_protests

This past December, most major American news outlets ran a story about police shooting statistics and race. No matter where they were situated on the political spectrum, journalists, pundits, and researchers tried to answer the question: Are American police disproportionately targeting and killing black people? The answers were universally supported by data, statistics, claims of objectivity, and a rhetoric of uncomfortable truths. Their conclusions, however, were all over the map.

Nicholas Kristof writing at The New York Times presented a long list of statistical measures that show racial discrimination is alive and well in America in the first of a five part series “When Whites Just Don’t Get It”. Bill O’Reilly over at Fox News argued the exact opposite was true with his own set of empirical measures in his segment “What Ferguson Protestors Accomplished”. MSNBC, USA Today, and CNN also joined the debate with their own experts and incompatible data projections.

CNN journalist Eric Bradner explains that these contradictory results are a paradox, “Two dramatically different statistics – and they could both be right.” According to Bradner’s “Factcheck”, Kristof builds his conclusions from the Federal Bureau of Investigation database on Supplementary Homicide while O’Reilly’s analysis comes from the Center for Disease Control and Prevention; both are incomplete records of police homicides. According to Bradner, the problem is a result of various definitions of cause of death while in police custody, whether natural, suicide, or homicide. Additionally, there is no incentive for police to self-report their own troubled behavior. He concludes that the different police homicide statistics highlight the importance of the US federal government collecting or mandating the reporting of police shootings. Once all of these cases are verified in a database, they would reveal the “definitive trends in police shootings”. Bradner’s logic shows his trust that more information collected by the government will automatically reveal the truth. Sadly, if this solution to mandate the reporting of police shootings were implemented, it would not eliminate racism in America or even alleviate the debate over whose statistics are correct. There would still be an infinite cycle of analysis, fact checks, and responses.

That is because statistics are a method which require constant choices from the analyst, choices that are ideologically charged. There are a range of mathematically appropriate choices that are selected or overlooked according to the person constructing them. A basic example of this is the use of measures of central tendency: the mean, median, and mode all offer a summary position of the midpoint of a dataset, but depending on the context, one will be better than the others at offering clarity to a situation. This clarity is, of course, always a simplification, skimming the surface of situations. When complexity is erased, the surface is inscribed with the analyst’s view of the world and their beliefs about what is plausible. None of the measures of central tendency are “wrong” either mathematically or realistically, yet they are couched in a discourse of objectivity and reliability, and that makes them a dangerous technology.

Using statistics to talk about racialized police aggression accepts that the truth cannot be found among its victims. This is not to say that the ideological potential hidden within statistical analysis is all “bad”. Statistics were first used as a tool of the state and the ruling elite, yet that does not mean that statistics cannot be used to further a liberatory cause. Their power can move across and through hierarchical power structures (e.g power is circular), and it limits the actions of elites as well as less fortunate people. For an excellent essay on this topic, read Ian Hacking’s (1980) piece “How should we do the history of statistics?”

The demand for statistical proof started as a response to urbanization in 18th century Europe; it was suddenly possible for two individuals living in large cities to never meet or share similar experiences. Theodore Porter in his 1995 book Trust in Numbers explores the history of quantification and statistics in European and American public life. By looking at a diversity of governmental forms (monarchy, democracy, and autocracy) and various academic disciplines (actuarial sciences, political economy, and social engineering), Porter uncovers a common process whereby statistics are adopted as part of “strategies of communication”. Quantification is a “technology of trust” that creates a common language that connects different professions, disciplines, and communities.

Perhaps statistics should be considered a technology of mistrust—statistics are used when personal experience is in doubt because the analyst has no intimate knowledge of it. Statistics are consistently used as a technology of the educated elite to discuss the lower classes and subaltern populations, those individuals that are considered unknowable and untrustworthy of delivering their own accounts of their daily life. A demand for statistical proof is blatant distrust of someone’s lived experience. The very demand for statistical proof is otherizing because it defines the subject as an outsider, not worthy of the benefit of the doubt.

What does this look like in practical terms? A white woman can say that a neighborhood is “sketchy” and most people will smile and nod. She felt unsafe, and we automatically trust her opinion. A black man can tell the world that every day he lives in fear of the police, and suddenly everyone demands statistical evidence to prove that his life experience is real. Anything approaching a “post-racial society” would not require different types of evidence to tell our life stories: anecdotal evidence for white people, statistics for black people. To the media that’s constantly demanding that lived experiences be backed up by statistics, here’s a fact check of your own: Your demand for statistical proof is racist.

Candice Lanius is a PhD student in the Department of Communication and Media at Rensselaer Polytechnic Institute who gets annoyed every time she hears someone say “The data speaks for itself!”

Website: https://clanius.wordpress.com/

Twitter: @Misclanius