Brian Knowlton at the New York Times Caucus Blog posted on a conference call with the White House’s new Chief Information Officer, Vivek Kundra. Of special interest to social scientists is his plans to create a data.gov repository. I’m getting a bit wary of these new free standing “.gov” sites and wondering if they represent an “openness meme” rather than actual openness. But the rhetoric sounds promising. From Saul Hansen’s Bits blog at the New York Times:
Another initiative will be to create a new site, Data.gov, that will become a repository for all the information the government collects. He pointed to the benefits that have already come from publishing the data from the Human Genome Project by the National Institutes of Health, as well as the information from military satellites that is now used in GPS navigation devices.
“There is a lot of data the federal government has and we need to make sure that all the data that is not private, or restricted for national security reasons, can be made public,” he said.
While more data availability is all good, we in the social sciences should keep an eye on the type of data that gets released. The federal government puts out a fair amount of quantitative data already. What I’m interested in is how that data is going to be made available. Will the layman with an interest in an issue be able to quickly mashup data and application to create information they can use. Can a local activist get water quality data from the EPA and be easily able to create a Google Map that shows areas of concern? it’s one thing to do a “data dump,” it’s another to be intentional in empowering people to use the data. Then again, that might be best left to “the crowd” of politically active geeks whose numbers I hope grow exponentially in the next few years.
Comments 3
Kenneth M. Kambara — March 6, 2009
I'd like to see more data available and scrutinized by the public. I know that often data like DOL statistics have been suppressed by & that oftentimes the weighting schemes are kept secret. While these issues may not go away, hopefully more transparency and openness will prevail.
I think the crowd of politically active geeks will drive the data use at least for now. I can also see a cottage industry springing up of tech-savvy firms jacking the data for marketing and strategy research for firms, organizations, and campaigns.
rkatclu — March 6, 2009
Amazon has recently made over 1TB(!) of public data available online as well.
http://www.readwriteweb.com/archives/amazon_exposes_1_terrabyte_of.php
http://aws.typepad.com/aws/2009/02/new-aws-public-data-sets-economics-dbpedia-freebase-and-wikipedia.html
Jennifer — March 7, 2009
I currently looked at the major bureaucratic data sets; social services, EPA, USDA, FDA and DPR. I read about your idea of data dumps, and that is what they are doing. A lot of the time series data is self reported. Fed straight from source. The "juicy" data is suppressed. I am currently trying to start my own new research project where I want to see if their is any correlation among increases in Hispanic children with mental disabilities and birth defects. I recently spoke to a special education teacher in Ventura County and he told be over the last 10-15 years he has seen a sharp increase in hispanic children in his classrooms. I would like to see if this is just due to the increasing population of hispanics or is this a effect from parents exposure to harmful chemicals??? I have contacted the social workers in ventura county and no response. I have gone online NO DATA. Transparency is needed in order for little people like myself and others to piece together the data and find patterns and maybe help solve social issues.