Predicting Tryptic Cleavage from Proteomics Data Using Decision Tree Ensembles
Author(s) -
Thomas Fannes,
Elien Vandermarliere,
Leander Schietgat,
Sven Degroeve,
Lennart Martens,
Jan Ramon
Publication year - 2013
Publication title -
journal of proteome research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.644
H-Index - 161
eISSN - 1535-3907
pISSN - 1535-3893
DOI - 10.1021/pr4001114
Subject(s) - decision tree , trypsin , proteomics , computer science , cleavage (geology) , mass spectrometry , shotgun proteomics , chemistry , data mining , chromatography , computational biology , biochemistry , biology , paleontology , fracture (geology) , gene , enzyme
Trypsin is the workhorse protease in mass spectrometry-based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction with Decision Trees), an algorithm based on a decision tree ensemble that was learned on publicly available peptide identification data from the PRIDE repository. We demonstrate that CP-DT is able to accurately predict tryptic cleavage: tests on three independent data sets show that CP-DT significantly outperforms the Keil rules that are currently used to predict tryptic cleavage. Moreover, the trees generated by CP-DT can make predictions efficiently and are interpretable by domain experts.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom