Premium
Destruction of normal distribution in small samples by centering and scaling
Author(s) -
Tóth Gergely
Publication year - 2011
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1382
Subject(s) - scaling , standard deviation , normal distribution , distribution (mathematics) , mathematics , sample size determination , population , statistics , sample (material) , focus (optics) , statistical physics , mathematical analysis , geometry , physics , optics , demography , sociology , thermodynamics
It is less emphasized in scientific literature that centering and scaling of data may drastically change the original distribution of the data for small samples. The destruction of the original distribution depends on the source of the estimation of the mean (centering) and the divisor (scaling) where the latter is connected to the spread of data. In our comparative study we focus on cases, where the sample is taken from normally distributed data; the means and the standard deviations are population or sample based. We discuss six cases in transforming the data or the sample means. Most of them are studied previously, but some of them have not been theoretically investigated. The transformed data follow normal distribution in three cases, if the scaling is performed with population standard deviation. In one case, the final distribution is related to β ‐distribution with astonishing density functions for N = 2 Dirac‐delta, N = 3 Viking helmet like and N = 4 uniform distributions. Another case is the well‐known t ‐distribution. For one of the transformed data, we were not able to identify the general form. Here we obtained only numerical results for 3 ≤ N . The effect of the transformations was tested on experimental data representing more or less normally distributed variables. We found that transformations using the sample standard deviation were significantly less normally distributed‐like than the original data for small samples, but the other transformations enhanced the normal distribution‐like feature. The results point out that centering and especially scaling require consideration for small samples up to questioning the reality of subsequent data evaluation processes where normal distribution is assumed. Copyright © 2011 John Wiley & Sons, Ltd.