Big data: big data or big brother? that is the question now.
by Jeffrey Johnson, Peter Denning, Kemal A. Delic, David Sousa-Rodrigues
This ACM Ubiquity Symposium presented some of the current thinking about big data developments across four topical dimensions: social, technological, application, and educational. While 10 articles can hardly touch the expanse of the field, we have sought to cover the most important issues and provide useful insights for the curious reader. More than two dozen authors from academia and industry provided shared their points of view, their current focus of interest and their outlines of future research. Big digital data has changed and will change the world in many ways. It will bring some big benefits in the future, but combined with big AI and big IoT devices creates several big challenges. These must be carefully addressed and properly resolved for the future benefit of humanity.
Big Data: Business, Technology, Education, and Science: Big Data (Ubiquity symposium)
by Jeffrey Johnson, Luca Tesei, Marco Piangerelli, Emanuela Merelli, Riccardo Paci, Nenad Stojanovic, Paulo Leitão, José Barbosa, Marco Amador
Transforming the latent value of big data into real value requires the great human intelligence and application of human-data scientists. Data scientists are expected to have a wide range of technical skills alongside being passionate self-directed people who are able to work easily with others and deliver high quality outputs under pressure. There are hundreds of university, commercial, and online courses in data science and related topics. Apart from people with breadth and depth of knowledge and experience in data science, we identify a new educational path to train "bridge persons" who combine knowledge of an organization's business with sufficient knowledge and understanding of data science to "bridge" between non-technical people in the business with highly skilled data scientists who add value to the business. The increasing proliferation of big data and the great advances made in data science do not herald in an era where all problems can be solved by deep learning and artificial intelligence. Although data science opens up many commercial and social opportunities, data science must complement other science in the search for new theory and methods to understand and manage our complex world.
Corporate Security is a Big Data Problem: Big Data (Ubiquity symposium)
by Louisa Saunier, Kemal A. Delic
In modern times, we have seen a major shift toward hybrid cloud architectures, where corporations operate in a large, highly extended eco-system. Thus, the traditional enterprise security perimeter is disappearing and evolving into the concept of security intelligence where the volume, velocity/rate, and variety of data have dramatically changed. Today, to cope with the fast-changing security landscape, we need to be able to transform huge data lakes via security analytics and big data technologies into effective security intelligence presented through a security "cockpit" to achieve a better corporate security and compliance level, support sound risk management and informed decision making. We present a high-level architecture for efficient security intelligence and the concept of a security cockpit as a point of control for the corporate security and compliance state. Therefore, we could conclude nowadays corporate security can be perceived as a big-data problem.
When Good Machine Learning Leads to Bad Security: Big Data (Ubiquity symposium)
by Tegjyot Singh Sethi, Mehmed Kantardzic
While machine learning has proven to be promising in several application domains, our understanding of its behavior and limitations is still in its nascent stages. One such domain is that of cybersecurity, where machine learning models are replacing traditional rule based systems, owing to their ability to generalize and deal with large scale attacks which are not seen before. However, the naive transfer of machine learning principles to the domain of security needs to be taken with caution. Machine learning was not designed with security in mind and as such is prone to adversarial manipulation and reverse engineering. While most data based learning models rely on a static assumption of the world, the security landscape is one that is especially dynamic, with an ongoing never ending arms race between the system designer and the attackers. Any solution designed for such a domain needs to take into account an active adversary and needs to evolve over time, in the face of emerging threats. We term this as the "Dynamic Adversarial Mining" problem, and this paper provides motivation and foundation for this new interdisciplinary area of research, at the crossroads of machine learning, cybersecurity, and streaming data mining.
Developing an Open Source 'Big Data' Cognitive Computing Platform: Big Data (Ubiquity symposium)
by Michael Kowolenko, Mladen A. Vouk
The ability to leverage diverse data types requires a robust and dynamic approach to systems design. The needs of a data scientist are as varied as the questions being explored. Compute systems have focused on the management and analysis of structured data as the driving force of analytics in business. As open source platforms have evolved, the ability to apply compute to unstructured information has exposed an array of platforms and tools available to the business and technical community. We have developed a platform that meets the needs of the analytics user requirements of both structured and unstructured data. This analytics workbench is based on acquisition, transformation, and analysis using open source tools such as Nutch, Tika, Elastic, Python, PostgreSQL, and Django to implement a cognitive compute environment that can handle widely diverse data, and can leverage the ever-expanding capabilities of infrastructure in order to provide intelligence augmentation.
High Performance Synthetic Information Environments
An integrating architecture in the age of pervasive data and computing: Big Data (Ubiquity symposium)
by Christopher L. Barrett, Jeffrey Johnson, Madhav Marathe
The complexities of social and technological policy domains, such as the economy, the environment, and public health, present challenges that require a new approach to modeling and decision-making. The information required for effective policy and decision making in these complex domains is massive in scale, fine-grained in resolution, and distributed over many data sources. Thus, one of the key challenges in building systems to support policy informatics is information integration. Synthetic information environments (SIEs) present a methodological and technological solution that goes beyond the traditional approaches of systems theory, agent-based simulation, and model federation. An SIE is a multi-theory, multi-actor, multi-perspective system that supports continual data uptake, state assessment, decision analysis, and action assignment based on large-scale high-performance computing infrastructures. An SIE allows rapid course-of-action analysis to bound variances in outcomes of policy interventions, which in turn allows the short time-scale planning required in response to emergencies such as epidemic outbreaks.
Technology and Business Challenges of Big Data in the Digital Economy: Big Data (Ubiquity symposium)
by Dave Penkler
The early digital economy during the dot-com days of internet commerce successfully faced its first big data challenges of click-stream analysis with map-reduce technology. Since then the digital economy has been becoming much more pervasive. As the digital economy evolves, looking to benefit from its burgeoning big data assets, an important technical-business challenge is emerging: How to acquire, store, access, and exploit the data at a cost that is lower than the incremental revenue or GDP that its exploitation generates. Especially now that efficiency increases, which lasted for 50 years thanks to improvements in semiconductor manufacturing, is slowing and coming to an end.
Big Data for Social Science Research: Big Data (Ubiquity symposium)
by Mark Birkin
Academic studies exploiting novel data sources are scarce. Typically, data is generated by commercial businesses or government organizations with no mandate and little motivation to share their assets with academic partners---partial exceptions include social messaging data and some sources of open data. The mobilization of citizen sensors at a massive scale has allowed for the development of impressive infrastructures. However, data availability is driving applications---problems are prioritized because data is available rather than because they are inherently important or interesting. The U.K. is addressing this through investments by the Economic and Social Research Council in its Big Data Network. A group of Administrative Data Research Centres are tasked with improving access to data sets in central government, while a group of Business and Local Government Centres are tasked with improving access to commercial and regional sources. This initiative is described. It is illustrated by examples from health care, transport, and infrastructure. In all of these cases, the integration of data is a key consideration. For social science problems relevant to policy or academic studies, it is unlikely all the answers will be found in a single novel data source, but rather a combination of sources is required. Through such synthesis great leaps are possible by exploiting models that have been constructed and refined over extended periods of time e.g., microsimulation, spatial interaction models, agents, discrete choice, and input-output models. Although interesting and valuable new methods are appearing, any suggestion that a new box of magic tricks labeled "Big Data Analytics" that sits easily on top of massive new datasets can radically and instantly transform our long-term understanding of society is naïve and dangerous. Furthermore, the privacy and confidentiality of personal data is a great concern to both the individuals concerned and the data owners.