z-logo
Premium
Thinking by classes in data science: the symbolic data analysis paradigm
Author(s) -
Diday Edwin
Publication year - 2016
Publication title -
wiley interdisciplinary reviews: computational statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.693
H-Index - 38
eISSN - 1939-0068
pISSN - 1939-5108
DOI - 10.1002/wics.1384
Subject(s) - computer science , symbolic data analysis , row , population , table (database) , set (abstract data type) , extension (predicate logic) , theoretical computer science , data set , data mining , artificial intelligence , database , programming language , demography , sociology
Data Science, considered as a science by itself, is in general terms, the extraction of knowledge from data. Symbolic data analysis ( SDA ) gives a new way of thinking in Data Science by extending the standard input to a set of classes of individual entities. Hence, classes of a given population are considered to be units of a higher level population to be studied. Such classes often represent the real units of interest. In order to take variability between the members of each class into account, classes are described by intervals, distributions, set of categories or numbers sometimes weighted and the like. In that way, we obtain new kinds of data, called ‘symbolic’ as they cannot be reduced to numbers without losing much information. The first step in SDA is to build the symbolic data table where the rows are classes and the variables can take symbolic values. The second step is to study and extract new knowledge from these new kinds of data by at least an extension of Computer Statistics and Data Mining to symbolic data. SDA is a new paradigm which opens up a vast domain of research and applications by giving complementary results to classical methods applied to standard data. SDA also gives answers to big data and complex data challenges as big data can be reduced and summarized by classes and as complex data with multiple unstructured data tables and unpaired variables can be transformed into a structured data table with paired symbolic‐valued variables. WIREs Comput Stat 2016, 8:172–205. doi: 10.1002/wics.1384 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here