No doubt of interest to sociologists, Facebook is throwing a sociology pre-conference on its campus ahead of the annual American Sociological Association meetings this fall. When the company is interested in recruiting sociologists and the work we do –research of the social world in all of its complexity– their focus, as shown in the event’s program, is heavily, heavily focused on quantitative demography. Critical, historical, theoretical, ethnographic research makes up a great deal of the sociological discipline, but isn’t the kind of sociology Facebook has ever seemed to be after. Facebook’s focus on quantitative sociology says much about what they take “social” to mean.

My background is in stats, I taught inferential statistics to sociology undergrads for a few years, I dig stats and respect their place in a rich sociological discourse. So, then, I also understand the dangers of statistical sociology done without a heavy dose of qualitative and theoretical work. Facebook and other social media companies have made mistake after mistake with their products that reflect a massive deficit of sociological imagination. The scope of their research should reflect and respect the fact that their products reach the near entirety of the social world.

Instead, what so many technology companies want from sociology is “big” data research, or what some survey researchers are calling “passive” data collection. One of the scariest things about numbers is that they find a shorter path towards authority; numbers are seductive because they look like answers. While social researchers fluent in statistical methods are calling for a more thoughtful understanding of what “big” data actually is and how it should be responsibly wielded —read danah boyd and Kate Crawford’s paper on this— social media companies, government agencies, and many other research institutions are rushing towards “big” data research at the expense of other methodologies.

What one sociology PhD candidate said in the Venturebeat story linked to above reflects what I hear all too often,

The data set available at Facebook is incredible. One reason is just the sheer scale of the data. While sociologists usually don’t have the resources to interview or survey millions of people, Facebook has data generated every day by its 802 million daily active users.

The second reason is the naturalness of the data. Sociologists typically use interview, survey, and ethnography to collect data.

“So I give you a survey you fill it out, which is very artificial,” said Laura Nelson, PhD candidate in Sociology at UC Berkeley, in an interview with VentureBeat.

“Whereas ethnography, as soon as you walk into the room, you change that room, because you are a foreign presence. There’s a scientist in the room. People get self-conscious. They don’t act naturally.”

In comparison, Facebook data is not influenced by the presence of a social science researcher. “It has no artificially construct, you are not bringing people to the lab,” Nelson said. “So you are recording social interaction in real time as it occurs completely naturally.”

This is common: “big” data is more natural and objective because researchers can peer in and gather data without disturbing what happens in this highly-recordable social context. Big N’s and small p’s from the comfort of the screen. At this point, methodologists are pulling their hair out: Facebook, or any social platform, isn’t “natural” (even when sidestepping nerdier debates if anything at all is ‘natural”). That Facebook “big” data is made by users unaware of or unconcerned about social science researchers doesn’t change the fact it is made through and around a structure engineers have coded. Yes, researchers, quant and qualitative, bias data in the collection process, but of course so does the Facebook and any site’s data collection process.

This fallacy of the “naturalness” of social media data is described in the boyd and Crawford paper linked to above (see their provocation 2), and especially great on this point is Zeynep Tufekci. From a paper of hers on “big” data research,

Each social media platform carries with it certain affordances which structure its social norms and interactions and may not be representative of other social media platforms, or general human social behavior


Research in the model organism paradigm can be quite illuminating, as it allows a large community of scholars to coalesce around similar datasets and problems. The field should not, however, lose sight of specific features of each platform and questions of representativeness

The tendency to see “big” social media data as objective and natural is the methodological avatar of the classic tech instrumentalism/constructivism mistake. I’m as tired of the tech constructivism versus determinism theory-go-round as anyone, but, quickly, the tech determinism fallacy is the myth that technologies “cause” or force us and the social world to do things or be a certain way, forgetting human agency and creativity; and the fallacy of tech constructivism (or “instrumentalism”) is that “guns don’t kill people“/”tech is just a tool” stuff that forgets that technologies have affordances –what we think we can or cant do with them– that structure our selves and the world. Don’t forget about agency don’t forget about structure and so on and so on: as simple as it seems, these errors crop up over and over again, and “big” data research too often comes standard with the constructivism fallacy, as the “naturalness” quote above exemplifies.

My scarequotes reference the fundamental smallness of “big” data. I think the term is half-misnamed, where “big” references only the size of the dataset, not its ability to answer the questions we ask of it. And this all speaks to the problem of social media companies fixating on “big” data at the expense of the rest of a vastly more diverse sociological imagination. I’ve got my issues with American Sociology as a discipline itself too often promoting quantitative research over the rest, but it should be made clear that social media companies’ research of the social world is even more dramatically lopsided. What does it mean for users when companies that trade in the “social” don’t attempt to understand the social in anywhere near the complexity the sociological discipline does? And who suffers from the inevitable mistakes that result?

nathan is on twitter and tumblr 

lead image via