In December, the Center for Data Innovation sent out an email titled, “New in Data: Big Data’s Positive Impact on Underserved Communities is Finally Getting the Attention it Deserves” containing an article by Joshua New. New recounts the remarks of Federal Trade Commissioner Terrell McSweeny at a Google Policy Forum in Washington, D.C. on data for social empowerment, which proudly lists examples of data-doing-good. As I read through the examples provided of big data “serving the underserved”, I was first confused and then frustrated. Though to be fair, I went into it a little annoyed by the title itself.
The idea of big data for social good is not “new in data”: big data have been in the news and a major concern for research communities around the world since 2008. One of the primary justifications, whether spoken or implicit, is that data will solve all of humanities biggest crises. Big data do not “deserve” attention: they are composed of inanimate objects without needs or emotions. And, while big data are having an “impact” on underserved communities, it is certainly not the unshakably positive, utopian impact that the title promises.
Big data are a complex of technologies designed, implemented, and controlled by those in power to measure and observe everyone else. Period. Big data are not serving the underserved; they serve the elites who design the systems. Where the goals of those controlling the system align with the underserved, then sure, big data do good. But when the goals and needs of stakeholders are in competition, the people’s needs are quickly forgotten. For something to “serve” a community, the community must have input into what the project or technology is and have at least partial control over its implementation. These are two easy heuristics: who has input and control? Effective and just policy solutions do not come from isolated conversations inside the beltway, from the ivory tower, or Silicon Valley incubators; solutions are built from working one-on-one with communities in need. Take the well-known example of mass government surveillance systems. In the United States, the National Security Agency justified surveillance programs as necessary for “the common good”, to protect the rights and freedoms of US citizens. Yet, the surveillance programs and the collection of more and more data have trumped individual privacy rights countless times. The American people have no input or control over the system, so it does not serve them.
Despite my dubious reading of the email title, I clicked onward to the substance of the article, wherein my concerns were substantiated. The examples provided to support McSweeney’s claims that big data serve the underserved completely miss the mark. These evidentiary cases, which I analyze below, leave “the underserved community” ill defined, never clarifying if the community is isolated rural families, poor individuals, minorities, or the elderly, and represent projects that operate outside of community input or buy-in.
- In California, smart meters are “enabling local authorities to enforce restrictions on water use” among homeowners. Analysis: Boosting law enforcement capability to monitor and fine homeowners over their water usage does not serve underserved communities. Additionally, the monitoring of individual families ignores the fact that the vast majority of California’s water is used by agriculture and industry.
- The Federal Government is using census data to target services, such as “providing prospective college students and their families information that can lead to better financial planning decisions.” Analysis: Targeted marketing of student loan and grant information serves the needs of lenders more than future students, and this project fails to address the rising costs of higher education; instead, it perpetuates student debt. Additionally, completing the census is a legal requirement for US residents, and they have no input into how that data is eventually used for both public services and private marketing.
- The Federal Trade Commission hosted a hackathon (a competition to build a software solution in a short period of time) to solve the “problem of automated telemarketing calls”, with the winning solution stopping “36 million unwanted robocalls.” Analysis: Stopping robocalls does not serve the underserved; it uses FTC resources to solve a first-world annoyance. This project raises the issue of who gets to decide what issues are worth pursuing. Rather than crowd sourcing a serious public health or safety issue, the FTC focused their energy on a minor inconvenience.
While the examples listed in this talk are Government programs, it is interesting to note the cozy relationship between private, for-profit industry partners and public decision makers. McSweeney admits this profit motivation early on in her talk, saying “Data is the fuel that powers much of our technological progress, and provides innovators with the raw materials they need to make better apps, better services, and better products.” The relationship between private and public partners is so cozy, in fact, that each example of government programs serving communities makes an ideological conflation. It assumes that “serving underserved communities” occurs when the government uses data to save money. Making a profit is not an inherently bad thing, but it can come into direct conflict with the needs of underserved communities. For one, these communities frequently need specialized services that are not generalizable, making them cost-prohibitive from a market standpoint: yet those costly services should still be delivered. For example, medical care for those suffering rare diseases or translation services for non-English speakers in public schools are expensive yet necessary services. A government for the people should be motivated beyond market concerns.
The tone taken in the body of the Center for Data Innovation article is not surprising. As a think tank (read: lobbyists) located in Washington, D.C., the Center’s purpose is “capitalizing” on the “enormous social and economic benefits” of big data by supporting “public policies designed to allow data-driven innovation to flourish”. Policies that allow innovation to flourish, frequently without scrutiny or oversight, is PR speak for get out of our way and let us do what we want. The Center’s goal to reduce barriers to data innovation means a reduction in oversight and consumer protections.
The article concludes, “It was heartening to hear such an esteemed figure in Washington policy community so clearly articulate how data is empowering individuals.” Yet none of the examples empower individuals! No underserved communities had a voice or any control over the implementation of the data system. As an individual in a privileged position, I cannot and should not make claims to know what a perfect big data project for an underserved community is, and that is the point. I am not in a position to know the specific needs of underserved communities. Instead of assuming I or anyone else in a position of power knows better, “big data for social good” projects should go out and actually ask their target community what their biggest concerns are and involve them in the creation of projects which respond to those needs. Agency is a fundamental condition of fulfilling an individual’s needs, and until a big data project asks for input and gives control to the people it claims to serve, then it is failing them.
Candice Lanius is a PhD Candidate in the Department of Communication and Media at Rensselaer Polytechnic Institute who gets annoyed every time she hears someone say “The data speaks for itself!”
Header image source: Jeremy Keith