Premium
Classification of MALDI‐MS imaging data of tissue microarrays using canonical correlation analysis‐based variable selection
Author(s) -
Winderbaum Lyron,
Koch Inge,
Mittal Parul,
Hoffmann Peter
Publication year - 2016
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.201500451
Subject(s) - canonical correlation , sample size determination , computer science , feature selection , pattern recognition (psychology) , dimensionality reduction , correlation , principal component analysis , artificial intelligence , rank (graph theory) , data mining , computational biology , statistics , mathematics , biology , geometry , combinatorics
Applying MALDI‐MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost‐ and time‐efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data—high‐dimensional low sample size—provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI‐MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3–20.9% of patients by leave‐one‐out cross‐validation and strongly outperforms LDA after reduction of the original data with principle component analysis.