z-logo
open-access-imgOpen Access
Statistical approach to normalization of feature vectors and clustering of mixed datasets
Author(s) -
Maria M. SuarezAlvarez,
Duc Truong Pham,
Mikhail Y. Prostov,
Yuriy I. Prostov
Publication year - 2012
Publication title -
proceedings of the royal society a mathematical physical and engineering sciences
Language(s) - English
Resource type - Journals
eISSN - 1471-2946
pISSN - 1364-5021
DOI - 10.1098/rspa.2011.0704
Subject(s) - normalization (sociology) , categorical variable , cluster analysis , minkowski distance , euclidean distance , mathematics , data mining , pattern recognition (psychology) , computer science , algorithm , artificial intelligence , statistics , sociology , anthropology
Normalization of feature vectors of datasets is widely used in a number of fields of data mining, in particular in cluster analysis, where it is used to prevent features with large numerical values from dominating in distance-based objective functions. In this study, a unified statistical approach to normalization of all attributes of mixed databases, when different metrics are used for numerical and categorical data, is proposed. After the proposed normalization, the contributions of both numerical and categorical attributes to a specified objective function are statistically the same. Formulae for the statistically normalized Minkowski mixed p-metrics are given in an explicit way. It is shown that the classic z-score standardization and the min–max normalization are particular cases of the statistical normalization, when the objective function is, respectively, based on the Euclidean or the Tchebycheff (Chebyshev) metrics. Finally, clustering of several benchmark datasets is performed with non-normalized and introduced normalized mixed metrics using either the k-prototypes (for p=2) or another algorithm (for p≠2).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom