acm - an acm publication


  • Big data: big data for social science research

    Academic studies exploiting novel data sources are scarce. Typically, data is generated by commercial businesses or government organizations with no mandate and little motivation to share their assets with academic partners---partial exceptions include social messaging data and some sources of open data. The mobilization of citizen sensors at a massive scale has allowed for the development of impressive infrastructures. However, data availability is driving applications---problems are prioritized because data is available rather than because they are inherently important or interesting. The U.K. is addressing this through investments by the Economic and Social Research Council in its Big Data Network. A group of Administrative Data Research Centres are tasked with improving access to data sets in central government, while a group of Business and Local Government Centres are tasked with improving access to commercial and regional sources. This initiative is described. It is illustrated by examples from health care, transport, and infrastructure. In all of these cases, the integration of data is a key consideration. For social science problems relevant to policy or academic studies, it is unlikely all the answers will be found in a single novel data source, but rather a combination of sources is required. Through such synthesis great leaps are possible by exploiting models that have been constructed and refined over extended periods of time e.g., microsimulation, spatial interaction models, agents, discrete choice, and input-output models. Although interesting and valuable new methods are appearing, any suggestion that a new box of magic tricks labeled "Big Data Analytics" that sits easily on top of massive new datasets can radically and instantly transform our long-term understanding of society is naïve and dangerous. Furthermore, the privacy and confidentiality of personal data is a great concern to both the individuals concerned and the data owners.