Premium
Discovery of false identification using similarity difference in GC–MS‐based metabolomics
Author(s) -
Kim Seongho,
Zhang Xiang
Publication year - 2015
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2665
Subject(s) - similarity (geometry) , identification (biology) , matching (statistics) , mass spectrometry , mass spectrum , metabolomics , pattern recognition (psychology) , mathematics , artificial intelligence , computer science , statistics , chemistry , chromatography , biology , botany , image (mathematics)
Compound identification is a critical process in metabolomics. The widely used approach for compound identification in gas chromatography–mass spectrometry‐based metabolomics is spectrum matching, in which the mass spectral similarity between an experimental mass spectrum and each mass spectrum in a reference library is calculated. While various similarity measures have been developed to improve the overall accuracy of compound identification, little attention has been paid to reducing the false discovery rate. We, therefore, develop an approach for controlling the false identification rate using the distribution of the difference between the first and second highest spectral similarity scores. We further propose a model‐based approach to achieving a desired true positive rate. The developed method is applied to the National Institute of Standards and Technology mass spectral library, and its performance is compared with that of the conventional approach that uses only the maximum spectral similarity score. The results show that the developed method achieves a significantly higher F 1 score and positive predictive value than did the conventional approach. Copyright © 2014 John Wiley & Sons, Ltd.