Premium
PlasmoSEP: Predicting surface‐exposed proteins on the malaria parasite using semisupervised self‐training and expert‐annotated data
Author(s) -
ElManzalawy Yasser,
Munoz Elyse E.,
Lindner Scott E.,
Honavar Vasant
Publication year - 2016
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.201600249
Subject(s) - plasmodium falciparum , proteome , throughput , false positive paradox , proteomics , plasmodium yoelii , identification (biology) , malaria , biology , computer science , computational biology , artificial intelligence , bioinformatics , immunology , genetics , gene , telecommunications , botany , parasitemia , wireless
Accurate and comprehensive identification of surface‐exposed proteins (SEPs) in parasites is a key step in developing novel subunit vaccines. However, the reliability of MS‐based high‐throughput methods for proteome‐wide mapping of SEPs continues to be limited due to high rates of false positives (i.e., proteins mistakenly identified as surface exposed) as well as false negatives (i.e., SEPs not detected due to low expression or other technical limitations). We propose a framework called PlasmoSEP for the reliable identification of SEPs using a novel semisupervised learning algorithm that combines SEPs identified by high‐throughput experiments and expert annotation of high‐throughput data to augment labeled data for training a predictive model. Our experiments using high‐throughput data from the Plasmodium falciparum surface‐exposed proteome provide several novel high‐confidence predictions of SEPs in P. falciparum and also confirm expert annotations for several others. Furthermore, PlasmoSEP predicts that 25 of 37 experimentally identified SEPs in Plasmodium yoelii salivary gland sporozoites are likely to be SEPs. Finally, PlasmoSEP predicts several novel SEPs in P. yoelii and Plasmodium vivax malaria parasites that can be validated for further vaccine studies. Our computational framework can be easily adapted to improve the interpretation of data from high‐throughput studies.