Premium
Descriptive statistics of large data sets by scatter plots, an exploratory approach
Author(s) -
Rey W.J.J.
Publication year - 1992
Publication title -
statistica neerlandica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.52
H-Index - 39
eISSN - 1467-9574
pISSN - 0039-0402
DOI - 10.1111/j.1467-9574.1992.tb01346.x
Subject(s) - scatter plot , plot (graphics) , statistic , mathematics , exploratory data analysis , statistics , selection (genetic algorithm) , tree (set theory) , computer science , pattern recognition (psychology) , algorithm , artificial intelligence , combinatorics
In the analysis of large tables of M variables on N observations one is interested in the relations between the variables and it is usual to inspect the M(M‐1)/2 scatter plots of N points. Clearly, the scatter plot approach relies on visual inspection and is to be preferred in so far as applicable to detect simple relations, namely when M is small. Other approaches are needed for large values of M . We consider that only the relatively few scatter plots that present a ‘structure’ are of interest for an exploratory analysis and, by ‘structure’, we mean a domain of specially high local density in the plot. Based on this concept, we propose a method constructed around two steps: the selection of the possibly interesting pairs of variables and the validation of the corresponding scatter plots. The selection of the pairs results from an algorithm based on a binary partitioning tree. The validation of the corresponding scatter plots enables the production of only those where a structure is found the recognition of a structure is derived from a statistic based on the length of the Minimum Spanning Tree constructed on the N points of the candidate scatter plot. For illustration, we report on an industrial application where the method is routinely applied for exploratory purposes.