
Mallows' L2 distance in some multivariate methods and its application to histogram-type data
Author(s) -
Katarina Ko,
L. Billard
Publication year - 2012
Publication title -
metodološki zvezki
Language(s) - English
Resource type - Journals
eISSN - 1854-0031
pISSN - 1854-0023
DOI - 10.51936/polr7329
Subject(s) - histogram , multivariate statistics , simple (philosophy) , mathematics , scaling , multidimensional scaling , distance measures , cluster analysis , type (biology) , term (time) , interpretation (philosophy) , inertia , algorithm , computer science , artificial intelligence , statistics , geometry , image (mathematics) , physics , ecology , philosophy , epistemology , classical mechanics , quantum mechanics , biology , programming language
Mallows' L2 distance allows for decomposition of total inertia into within and between inertia according to Huygens theorem. It can be decomposed into three terms: the location term, the spread term and the shape term; a simple and straightforward proof of this theorem is presented. These characteristics are very helpful in the interpretation of the results for some distance-based methods, such as clustering by k-means and classical multidimensional scaling. For histogram-type data, Mallows' L2 distance is preferable because its calculation is simple, even when the number and length of the histograms' subintervals differ. An illustration of its use on population pyramids for 14 East European countries in the period 1995–2015 is presented. The results provide an insight into the information that this distance can extract from a complex dataset.