z-logo
open-access-imgOpen Access
Combining Machine Learning and Homology-Based Approaches to Accurately Predict Subcellular Localization in Arabidopsis
Author(s) -
Rakesh Kaundal,
Reena V. Saini,
Patrick X. Zhao
Publication year - 2010
Publication title -
plant physiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.554
H-Index - 312
eISSN - 1532-2548
pISSN - 0032-0889
DOI - 10.1104/pp.110.156851
Subject(s) - arabidopsis , subcellular localization , computational biology , proteome , biology , arabidopsis thaliana , genome , caenorhabditis elegans , gene , genetics , mutant
A complete map of the Arabidopsis (Arabidopsis thaliana) proteome is clearly a major goal for the plant research community in terms of determining the function and regulation of each encoded protein. Developing genome-wide prediction tools such as for localizing gene products at the subcellular level will substantially advance Arabidopsis gene annotation. To this end, we performed a comprehensive study in Arabidopsis and created an integrative support vector machine-based localization predictor called AtSubP (for Arabidopsis subcellular localization predictor) that is based on the combinatorial presence of diverse protein features, such as its amino acid composition, sequence-order effects, terminal information, Position-Specific Scoring Matrix, and similarity search-based Position-Specific Iterated-Basic Local Alignment Search Tool information. When used to predict seven subcellular compartments through a 5-fold cross-validation test, our hybrid-based best classifier achieved an overall sensitivity of 91% with high-confidence precision and Matthews correlation coefficient values of 90.9% and 0.89, respectively. Benchmarking AtSubP on two independent data sets, one from Swiss-Prot and another containing green fluorescent protein- and mass spectrometry-determined proteins, showed a significant improvement in the prediction accuracy of species-specific AtSubP over some widely used "general" tools such as TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-PLoc, and our newly created All-Plant method. Cross-comparison of AtSubP on six nontrained eukaryotic organisms (rice [Oryza sativa], soybean [Glycine max], human [Homo sapiens], yeast [Saccharomyces cerevisiae], fruit fly [Drosophila melanogaster], and worm [Caenorhabditis elegans]) revealed inferior predictions. AtSubP significantly outperformed all the prediction tools being currently used for Arabidopsis proteome annotation and, therefore, may serve as a better complement for the plant research community. A supplemental Web site that hosts all the training/testing data sets and whole proteome predictions is available at http://bioinfo3.noble.org/AtSubP/.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom