Premium
A hybrid retention time alignment algorithm for SWATH‐MS data
Author(s) -
Wu Long,
Amon Sabine,
Lam Henry
Publication year - 2016
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.201500511
Subject(s) - feature (linguistics) , computer science , dynamic time warping , algorithm , pattern recognition (psychology) , identification (biology) , matching (statistics) , workflow , artificial intelligence , mathematics , database , philosophy , linguistics , botany , statistics , biology
Recently, data‐independent acquisition (DIA) MS has gained popularity as a qualitative–quantitative workflow for proteomics. One outstanding problem in the analysis of DIA‐MS data is alignment of chromatographic retention times across multiple samples, which facilitates peptide identification and accurate quantification. Here, we present a novel hybrid (profile‐based and feature‐based) algorithm for LC‐MS alignment and test it on sequential windowed acquisition of all theoretical fragment ion mass spectra (SWATH) (a type of DIA) data. Our algorithm uses a profile‐based dynamic time warping algorithm to obtain a coarse alignment and corrects large retention time shifts, and then uses a feature‐based bipartite matching algorithm to match feature to feature at a fine scale. We evaluated our method by comparing our aligned feature pairs to peptide identification results of pseudo‐MS2 spectra exported by DIA‐Umpire, a recently reported tool for deconvoluting DIA‐MS data. We proposed that our method can be used to align DIA‐MS data prior to identification, and the alignment can be used to delete noise peaks or screen for differentially changed features. We found that a simple alignment‐enabled denoising scheme can reduce the number of pseudo‐MS2 spectra exported by DIA‐Umpire by up to around 40%, while retaining a comparable number of identifications. Finally, we demonstrated the utility of our tool for accurate label‐free relative quantification across multiple SWATH runs.