
How much peptide sequence information is contained in ion trap tandem mass spectra?
Author(s) -
Jürgen Cox,
Nina C. Hubner,
Matthias Mann
Publication year - 2008
Publication title -
journal of the american society for mass spectrometry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.961
H-Index - 127
eISSN - 1879-1123
pISSN - 1044-0305
DOI - 10.1016/j.jasms.2008.07.024
Subject(s) - chemistry , tandem mass spectrometry , peptide sequence , bottom up proteomics , proteome , peptide , human proteome project , ion trap , mass spectrum , sequence (biology) , peptide mass fingerprinting , computational biology , tandem mass tag , isobaric labeling , amino acid , proteomics , mass spectrometry , protein mass spectrometry , quantitative proteomics , biochemistry , biology , chromatography , gene
Matching peptide tandem mass spectra to their cognate amino acid sequences in databases is a key step in proteomics. It is usually performed by assigning a score to a spectrum-sequence combination. De novo sequencing or partial de novo sequencing is useful for organisms without sequenced genome or for peptides with unexpected modifications. Here we use a very large, high accuracy proteomic dataset to investigate how much peptide sequence information is present in tandem mass spectra generated in a linear ion trap (LTQ). More than 400,000 identified tandem mass spectra from a single human cancer cell line project were assigned to 26,896 distinct peptide sequences. The average absolute fragment mass accuracy is 0.102 Da. There are on average about four complementary b- and y-ions; both series are equally represented but y ions are 2- to 3-fold more intense up to mass 1000. Half of all spectra contain uninterrupted b- or y-ion series of at least six amino acids and combining b- and y-ion information yields on average seven amino acid sequences. These sequences are almost always unique in the human proteome, even without using any precursor or peptide sequence tag information. Thus, optimal de novo sequencing algorithms should be able to obtain substantial sequence information in at least half of all cases.