
Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network
Author(s) -
Kaiyuan Liu,
Sujun Li,
Lei Wang,
Yuzhen Ye,
Haixu Tang
Publication year - 2020
Publication title -
analytical chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.117
H-Index - 332
eISSN - 1520-6882
pISSN - 0003-2700
DOI - 10.1021/acs.analchem.9b04867
Subject(s) - chemistry , ion , spectral line , fragmentation (computing) , mass spectrum , dissociation (chemistry) , tandem mass spectrometry , tandem , electron ionization , mass spectrometry , analytical chemistry (journal) , pattern recognition (psychology) , artificial intelligence , ionization , computer science , chromatography , physics , astronomy , materials science , organic chemistry , composite material , operating system
The ability to predict tandem mass (MS/MS) spectra from peptide sequences can significantly enhance our understanding of the peptide fragmentation process and could improve peptide identification in proteomics. However, current approaches for predicting high-energy collisional dissociation (HCD) spectra are limited to predict the intensities of expected ion types, that is, the a/b/c/x/y/z ions and their neutral loss derivatives (referred to as backbone ions ). In practice, backbone ions only account for <70% of total ion intensities in HCD spectra, indicating many intense ions are ignored by current predictors. In this paper, we present a deep learning approach that can predict the complete spectra (both backbone and nonbackbone ions) directly from peptide sequences. We made no assumptions or expectations on which kind of ions to predict but instead predicting the intensities for all possible m / z . Training this model needs no annotations of fragment ion nor any prior knowledge of the fragmentation rules. Our analyses show that the predicted 2+ and 3+ HCD spectra are highly similar to the experimental spectra, with average full-spectrum cosine similarities of 0.820 (±0.088) and 0.786 (±0.085), respectively, very close to the similarities between the experimental replicated spectra. In contrast, the best-performed backbone only models can only achieve an average similarity below 0.75 and 0.70 for 2+ and 3+ spectra, respectively. Furthermore, we developed a multitask learning (MTL) approach for predicting spectra of insufficient training samples, which allows our model to make accurate predictions for electron transfer dissociation (ETD) spectra and HCD spectra of less abundant charges (1+ and 4+).