z-logo
Premium
Coherent point drift peak alignment algorithms using distance and similarity measures for two‐dimensional gas chromatography mass spectrometry data
Author(s) -
Li Zeyu,
Kim Seongho,
Zhong Sikai,
Zhong Zichun,
Kato Ikuko,
Zhang Xiang
Publication year - 2020
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.3236
Subject(s) - algorithm , mass spectrometry , similarity (geometry) , preprocessor , two dimensional gas , homogeneous , matching (statistics) , matrix (chemical analysis) , computer science , chemistry , mathematics , artificial intelligence , chromatography , image (mathematics) , statistics , combinatorics
The peak alignment is a vital preprocessing step before downstream analysis, such as biomarker discovery and pathway analysis, for two‐dimensional gas chromatography mass spectrometry (2DGCMS)–based metabolomics data. Due to uncontrollable experimental conditions, for example, the differences in temperature or pressure, matrix effects on samples, and stationary phase degradation, a shift of retention times among samples inevitably occurs during 2DGCMS experiments, making it difficult to align peaks. Various peak alignment algorithms have been developed to correct retention time shifts for homogeneous, heterogeneous, or both types of mass spectrometry data. However, almost all existing algorithms have been focused on a local alignment and are suffering from low accuracy especially when aligning dense biological data with many peaks. We have developed four global peak alignment (GPA) algorithms using coherent point drift (CPD) point matching algorithms: retention time‐based CPD‐GPA (RT), prior CPD‐GPA (P), mixture CPD‐GPA (M), and prior mixture CPD‐GPA (P + M). Method RT performs the peak alignment based only on the retention time distance, while methods P, M, and P + M carry out the peak alignment using both the retention time distance and mass spectral similarity. Method P incorporates the mass spectral similarity through prior information, and Methods M and P + M use the mixture distance measure. Four developed algorithms are applied to homogeneous and heterogeneous spiked‐in data as well as two real biological data and compared with three existing algorithms, mSPA, SWPA, and BiPACE‐2D. The results show that our CPD‐GPA algorithms perform better than all existing algorithms in terms of F1 score.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here