Premium
Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection
Author(s) -
Gras Robin,
Müller Markus,
Gasteiger Elisabeth,
Gay Steven,
Binz PierreAlain,
Bienvenut William,
Hoogland Christine,
Sanchez JeanCharles,
Bairoch Amos,
Hochstrasser Denis F.,
Appel Ron D.
Publication year - 1999
Publication title -
electrophoresis
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.666
H-Index - 158
eISSN - 1522-2683
pISSN - 0173-0835
DOI - 10.1002/(sici)1522-2683(19991201)20:18<3535::aid-elps3535>3.0.co;2-j
Subject(s) - peptide mass fingerprinting , computer science , identification (biology) , mass spectrometry , algorithm , set (abstract data type) , parameterized complexity , sequence database , pattern recognition (psychology) , data mining , artificial intelligence , chemistry , proteomics , chromatography , biology , biochemistry , botany , gene , programming language
Abstract We have developed a new algorithm to identify proteins by means of peptide mass fingerprinting. Starting from the matrix‐assisted laser desorption/ionization‐time‐of‐flight (MALDI‐TOF) spectra and environmental data such as species, isoelectric point and molecular weight, as well as chemical modifications or number of missed cleavages of a protein, the program performs a fully automated identification of the protein. The first step is a peak detection algorithm, which allows precise and fast determination of peptide masses, even if the peaks are of low intensity or they overlap. In the second step the masses and environmental data are used by the identification algorithm to search in protein sequence databases (SWISS‐PROT and/or TrEMBL) for protein entries that match the input data. Consequently, a list of candidate proteins is selected from the database, and a score calculation provides a ranking according to the quality of the match. To define the most discriminating scoring calculation we analyzed the respective role of each parameter in two directions. The first one is based on filtering and exploratory effects, while the second direction focuses on the levels where the parameters intervene in the identification process. Thus, according to our analysis, all input parameters contribute to the score, however with different weights. Since it is difficult to estimate the weights in advance, they have been computed with a generic algorithm, using a training set of 91 protein spectra with their environmental data. We tested the resulting scoring calculation on a test set of ten proteins and compared the identification results with those of other peptide mass fingerprinting programs.