Are your data gathered?
Author(s) -
Alban Siffer,
Pierre-Alain Fouque,
Alexandre Termier,
Christine Largouët
Publication year - 2018
Publication title -
hal (le centre pour la communication scientifique directe)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/3219819.3219994
Subject(s) - unimodality , computer science , outlier , cluster analysis , statistical hypothesis testing , relevance (law) , artificial intelligence , data mining , multivariate statistics , simple (philosophy) , multivariate normal distribution , machine learning , pattern recognition (psychology) , mathematics , statistics , philosophy , epistemology , political science , law
Understanding data distributions is one of the most fundamental research topic in data analysis. The literature provides a great deal of powerful statistical learning algorithms to gain knowledge on the underlying distribution given multivariate observations. We are likely to find out a dependence between features, the appearance of clusters or the presence of outliers. Before such deep investigations, we propose the folding test of unimodality. As a simple statistical description, it allows to detect whether data are gathered or not (unimodal or multimodal). To the best of our knowledge, this is the first multivariate and purely statistical unimodality test. It makes no distribution assumption and relies only on a straightforward p-value. Through real world data experiments, we show its relevance and how it could be useful for clustering.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom