Desktop version

Home arrow Computer Science

  • Increase font
  • Decrease font


<<   CONTENTS   >>

Analytics and Data Science 101

From Plato to Davenport and Patil

A bulk of scientific discovery and learning throughout history has emerged largely from the work of universities. The origins of what we now call “universities” started with Plato’s academy for philosophical thought - the first “university”, where great notions were deliberated en masse. However, as scientific study began in earnest, those next-generation critical thinkers, from Rene Descartes to Newton to DaVinci, realized the need for structure for any empirical meaningfulness to be attained. This need gave rise to the formal disciplines and what today we call the “Academy”. Disciplines such as Mathematics, Physics, Chemistry, and the Arts, thrived in dedicated settings for discovery, modes for inquiry, models for explanation, and avenues for dissemination of findings.

With science came improvements in measurement (microscope and telescope) and practice (surgery and engineering), prompting academic fields to become deeper and evermore specialized. For much of the past hundred years, scientific advancement has been characterized by continued specialization due to improved measurement, the need to propagate the careers of academics, and the need of the academy to support its own claim to the gold star of knowledge, which has come to define the current landscape of universities worldwide. In other words ... academics and academic disciplines have historically thrived in highly siloed environments with little to no engagement outside their field.

As two people with a combined 50 years of experience in academic settings, we can tell you that no organization does "siloes" better than university.

Only recently have previously siloed academic environments begun to create receptive points of intersection. Consider relatively new fields of study such as biochemistry, behavioral economics, and biomedical engineering - which have evolved through the willingness and intellectual entrepreneurism of researchers and practitioners who saw the value of collaborating outside of their siloed discipline. Similarly, the field of data science has emerged from outside of traditional academic siloes through the intersection of the established disciplines of statistics and computer science.

Tlie term "data science” is actually not new. Its first references can be traced back to computer scientist Peter Naur in I960 and from statistician John W. Tukey in 1962. Tukey wrote:

For a long time I thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have ... come to feel that my central interest is in data analysis.

(Tukey, 1962)

A reference to the term “data science” as an academic discipline within statistics was made in the proceedings of the Fifth Conference of the International Federation of Classification Societies in 19961. In 1997, during his inaugural lecture as the H. C. Carver Chair in Statistics at the University of Michigan, JefFWu called for statistics to be renamed “data science” and statisticians to be renamed “data scientists”2.

While many in the field of statistics have a justifiable claim on being part of the "founding members club" of data science, statistics - and statisticians - were not prepared for the evolution of data.

Consider Figure 1.1. In the field of statistics, most formal instruction - if it goes beyond the theory - involves the translation of data into information that would be characterized as “Small, Structured, and Static” (think Excel spreadsheets) using traditional methods and supervised modeling techniques. However, as data evolves beyond the structured files into images, text, and streaming data, into the

Evolution of data

Figure 1.1 Evolution of data.

second and third boxes of Figure 1.1, practitioners (and students) need integrated concepts from computer science - and increasingly from engineering and from the humanities - as well as context from the domain where the data originated. Statistics is needed ... but not sufficient. Computer science is needed ... but not sufficient. Marketing ... finance ... engineering ... ethics ... all needed ... but not sufficient.

An examination of the popularity of the term “Data Science” from Google Trends, indicates that the term was a subject of the search engine in the early days of 2004. See Figure 1.2. Around the end of 2012, we see an inflection point, with “Data Science” experiencing a surge as a popular search term. While there are multiple points of explanation for the timing of this inflection - our increased

Google Trends index for "Data Science" (2004-2020)

Figure 1.2 Google Trends index for "Data Science" (2004-2020).

capacity to access, capture, and store data to the growing number of academic programs with “data science” in the title - there was one article that is regularly cited as starting the national conversation about data science. In October 2012, the Harvard Business Review published Thomas Davenport and D. J. Patil’s article “Data Science: The Sexiest Job of the 21st Century”3. The dating profiles for data scientists never looked the same again.

And then came the pandemic of 2020 and the world forever changed. During this long period, most everyone in the world cautiously withdrew for important safety reasons, yet the work of the world continued, often in new ways and with new demands. We saw the need for expanded disease surveillance, for disease testing and tracing data and for mapping of spread — all requiring the skills of data scientists. Public health analytics became an unparalleled field without a pipeline of practitioners. We continue to realize tremendous demands for improved video conferencing platforms, for enhanced video storage, and the increased demand for online entertainment content, all of which create mineable data. Our increased global reliance on home delivery and distribution services like Amazon, UPS, and Federal Express will have profound impacts on our notions of supply chain efficiencies. And all of this increases the need for data security, privacy, and exacerbate the wealth of ethical considerations that have emerged along the way.

 
<<   CONTENTS   >>

Related topics