z-logo
Premium
Feature selection for OPLS discriminant analysis of cancer tissue lipidomics data
Author(s) -
Tokareva Alisa O.,
Chagovets Vitaliy V.,
Starodubtseva Natalia L.,
Nazarova Niso M.,
Nekrasova Maria E.,
Koikhin Alexey S.,
Frankevich Vladimir E.,
Nikolaev Evgeny N.,
Sukhikh Gennady T.
Publication year - 2020
Publication title -
journal of mass spectrometry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.475
H-Index - 121
eISSN - 1096-9888
pISSN - 1076-5174
DOI - 10.1002/jms.4457
Subject(s) - lipidomics , opls , chemistry , linear discriminant analysis , random forest , feature selection , support vector machine , mass spectrometry , pattern recognition (psychology) , artificial intelligence , principal component analysis , biological system , lipidome , computational biology , receiver operating characteristic , chromatography , analytical chemistry (journal) , statistics , mathematics , biochemistry , molecular dynamics , computational chemistry , computer science , water model , biology
The mass spectrometry‐based molecular profiling can be used for better differentiation between normal and cancer tissues and for the detection of neoplastic transformation, which is of great importance for diagnostics of a pathology, prognosis of its evolution trend, and development of a treatment strategy. The aim of the present study is the evaluation of tissue classification approaches based on various data sets derived from the molecular profile of the organic solvent extracts of a tissue. A set of possibilities are considered for the orthogonal projections to latent structures discriminant analysis: all mass spectrometric peaks over 300 counts threshold, subset of peaks selected by ranking with support vector machine algorithm, peaks selected by random forest algorithm, peaks with the statistically significant difference of the intensity determined by the Mann‐Whitney U test, peaks identified as lipids, and both identified and significantly different peaks. The best predictive potential is obtained for OPLS‐DA model built on nonpolar glycerolipids ( Q 2 = 0.64, area under curve [AUC] = 0.95); the second one is OPLS‐DA model with lipid peaks selected by random forest algorithm ( Q 2 = 0.58, AUC = 0.87). Moreover, models based on particular molecular classes are more preferable from biological point of view, resulting in new explanatory mechanisms of pathophysiology and providing a pathway analysis. Another promising features for OPLS‐DA modeling are phosphatidylethanolamines ( Q 2 = 0.48, AUC = 0.86).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here