Premium
Cleveland's action plan and the development of data science over the last 12 years
Author(s) -
Kane Michael J.
Publication year - 2014
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11244
Subject(s) - library science , haven , citation , biostatistics , computer science , medicine , mathematics , nursing , combinatorics , public health
Bill Cleveland’s ‘Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics’ [1] appeared in the International Statistical Institute’s Review in 2001 and its goal was to identify six technical areas for the development of data-driven research and propose allocations of resources for each in these areas. While the paper poses as an action plan for developing the practice of data science, its real significance is putting the statistician at the confluence of statistics, computer science, and the research question that collected data are intented to address. It proposes that the statistician’s role should not be to support applied research by providing power calculations and p-values. Instead, the statistician should lead the applied research charge, engaging with all aspects of understanding data, from conception through interpretation. It elevates the statistician to lead investigator, not just in one area of science, social science, or medicine but to all areas of datadriven investigation. And most importantly, it proposes that data analysis provides a normalized approach to scientific research, regardless of the branch. To contextualize the impact of the paper I’ have used Google Trends to track the search-term usage over time of ‘data science’ along with ‘cloud computing’, and ‘big data’. A plot of the result is shown in Fig. 1. The second two terms were chosen since they are somewhat related to computing movements and they help to put the ‘data science’ searchtraffic into perspective. The graph itself shows several striking features. First and foremost, we do not see an uptick in ‘data science’ until around 2012. The fact that it takes a full 11 years for the impact of the plan to be felt underscores how far ahead of its time it was. Second, this uptick coincides with a more dramatic uptick in ‘big data’. This term is generally used in the context of understanding big data and should be considered a specialty area of data