Premium
ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs
Author(s) -
Thiel Michel,
Féraud Baptiste,
Govaerts Bernadette
Publication year - 2017
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2895
Subject(s) - principal component analysis , multivariate statistics , context (archaeology) , design of experiments , estimator , multivariate analysis , computer science , metabolomics , computational biology , multivariate analysis of variance , data mining , bioinformatics , statistics , biology , mathematics , machine learning , artificial intelligence , paleontology
Many modern analytical methods are used to analyse samples coming from an experimental design, for example, in medical, biological, or agronomic fields. Those methods generate most of the time highly multivariate data like spectra or images. This is the case of “omics” technologies used to detect genes (genomics), mRNA (transcriptomics), proteins (proteomics), or metabolites (metabolomics) in a specific biological sample. Those technologies produce high‐dimensional multivariate databases where the number of variables (descriptors) tends to be much larger than the number of experimental units. Moreover, experiments in omics often follow designs aimed at understanding the effect of several factors on biological systems. Therefore, multivariate statistical tools are needed to highlight variables that are consistently modified by different biological states. It is in this context that 2 recent methods combine analysis of variance (ANOVA) and principal component analysis (PCA), namely, ASCA (ANOVA–simultaneous component analysis) and APCA (ANOVA‐PCA). They provide powerful tools to visualize multivariate structures in the space of each effect of the statistical model linked to the experimental design. Their main limitation is that they provide biased estimators of the factor effects when the design of experiment is unbalanced. This paper introduces 2 new methods, ASCA+ and APCA+, that allow, respectively, to extend the use of ASCA and APCA to unbalanced designs using several principles from the theory of general linear models. Both methods are applied on real‐life metabolomics data, clearly demonstrating the capacity of ASCA+ and APCA+ methods to highlight correct biomarkers corresponding to effects of interest in unbalanced designs.