Data Science

Created by: Karina Gibert

Data Science is the multidisciplinary field that combines data analysis with data processing methods and domain expertise, to transform data into understandable and actionable knowledge relevant for informed decision making, thus contributing to bridge Hammond’s Fact Gap, i.e. the disconnection between data available and effective decision-making processes.

 

Table of Content [Hide]

 

Data Science

 by Karina Gibert (Intelligent Data Science and Artificial Intelligence Research Center, Universitat Politècnica de Catalunya-BarcelonaTech)

Data Science is the multidisciplinary field that combines data analysis with data processing methods and domain expertise, to transform data into understandable and actionable knowledge relevant for informed decision making, thus contributing to bridge Hammond’s Fact Gap, i.e. the disconnection between data available and effective decision-making processes.

The paper (Gibert et al. 2018) [1] is providing an overview of the origins of the term and a new definition of the field, according to the  current scope of the term. Data Science is a relatively new discipline that became very popular in the last years. However, the term Data Science is much older than expected. The first use of the term is due to Naur in 1960 (Sundaresan, 2017)[2], which used the term to mean “data processing” in the computer science sense.  See (Gibert et al. 2018) for an analysis of the evolution of the term Data Science and its relationship with Conputer Science, Machine Learning, Statistics, Data analysis and Data Mining. Currently, Data Science is closely related with the terms Data Mining and Big Data Analytics rather than the original sense  in which the term was created by Naur and associated a new bussiness opportunity.  In 1990 the first international journal, Data mining and Knowledge Discovery, was launched: Between 2002 and 2003 the first journals in data science  were launched: The Data Science Journal and The Journal of Data Science, consolidating Data Science as an emmergent discipline.

The development in ICT provided the technological infrastructures to allow most organizations (profit or non-profit) to collect large volumes of data  as an objective source of information about their targets and Data Sciences provides the methodologies and tools to create added value from them.  Data Science is the process of discovering  what is unkown from data and enables to get predictive, actionable insight from data. (Somohano, 2013) attributes the popularity of Data Science in bussiness to the capacity that Data Science approach provides to companies to create data products with business impact, communicate relevant business from data, and build confidence in decisions that drive business value. Data Science is a multidisciplinary field that embraces data analysis, visualization, machine learning, knowledge management and optimization to disclose value from data. Also, Data Science is a transversal field that might be applied to any target domain (biology, health, environment, energy, economy, sociology.) providing outputs to suport the most complex decisions based on evidences provided by data and contributing to a better understanding and management of complex phenomena of any kind. (Lauro, 2017)[3] characterize Data Science as the process by which data are transformed into actionable knowledge to perform predictions as well to support and validate decisions. Lauro uses a metaphor such that “Computer Science represents the language of Data Science, Statistics the logics of Data Science, and domain expertise constitutes a catalytic element in the absence of which the transformation cannot be achieved”.

Nowadays, it tackles with many sources of information, rather than classical numerical indicators that were measuring reality, and data science analysis might extract decisional knowledge from videos, audio recordings, signals, data streams, or websites. The development of Data Science promoted a new concept in decision-making in general  where decisions are data-driven, and the added value to organizations (either institutions or companies) is not more technology, nor capital, but information, where data is considered to be a primary source of knowledge. Data Science often requires the ability to overcome data complexity and the limitations of classical statistics and machine learning techniques - for example dealing simultaneously with heterogeneous data sources (e.g., videos, text, or streams) or coping with non-independencies, non-normalities, and few technical hypothesis on variable’s distributions, when required.

Data Science constitutes a new discipline and requires a specific combination of skills to be properly used. Conway’s Data Science Venn Diagram (Figure 1, Conway, 2013[4]) provides a useful conceptualization for how coding skills (Conway calls them hacking skills), math and statistics knowledge, and domain science expertise (Conway calls this substantive expertise) come together to enable Data Science.  It is only at the intersection between these elements that Data Science can be most effective.

 

 

References

  1. Karina Gibert; Jeffery S. Horsburgh; Ioannis N. Athanasiadis; Geoff Holmes; Environmental Data Science. Environmental Modelling & Software 1970, 106, 4-12, 10.1016/j.envsoft.2018.04.005.
  2. Sundaresan, N. (2017). The history of data science, Huffpost 2017, May 25th, Quora.
  3. Lauro, N. C., Amaturo, E., Grassia, M. G., Aragona, B., Marino, M. (eds.) (2017). Data Science and Social Research, Epistemology, Methods, Technology and Applications. Studies in Classification, Data Analysis and Knowledge Organization v 1564. ISBN: 978-3-319-55477-8, Springer Int’l 2017.
  4. Conway (2013). The DataScience Venn Diagram: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram