Premium
A workflow to increase the detection rate of proteins from unsequenced organisms in high‐throughput proteomics experiments
Author(s) -
Grossmann Jonas,
Fischer Bernd,
Baerenfaller Katja,
Owiti Judith,
Buhmann Joachim M.,
Gruissem Wilhelm,
Baginsky Sacha
Publication year - 2007
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.200700474
Subject(s) - proteome , computational biology , workflow , proteomics , genome , identification (biology) , biology , sequence assembly , computer science , bioinformatics , genetics , database , gene , transcriptome , botany , gene expression
We present and evaluate a strategy for the mass spectrometric identification of proteins from organisms for which no genome sequence information is available that incorporates cross‐species information from sequenced organisms. The presented method combines spectrum quality scoring, de novo sequencing and error tolerant BLAST searches and is designed to decrease input data complexity. Spectral quality scoring reduces the number of investigated mass spectra without a loss of information. Stringent quality‐based selection and the combination of different de novo sequencing methods substantially increase the catalog of significant peptide alignments. The de novo sequences passing a reliability filter are subsequently submitted to error tolerant BLAST searches and MS‐BLAST hits are validated by a sampling technique. With the described workflow, we identified up to 20% more groups of homologous proteins in proteome analyses with organisms whose genome is not sequenced than by state‐of‐the‐art database searches in an Arabidopsis thaliana database. We consider the novel data analysis workflow an excellent screening method to identify those proteins that evade detection in proteomics experiments as a result of database constraints.