z-logo
Premium
Network‐based Auto‐probit Modeling for Protein Function Prediction
Author(s) -
Jiang Xiaoyu,
Gold David,
Kolaczyk Eric D.
Publication year - 2011
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2010.01519.x
Subject(s) - computer science , protein function prediction , data mining , machine learning , artificial intelligence , bayesian network , biological database , biological data , autoregressive model , mathematics , protein function , bioinformatics , statistics , biology , biochemistry , gene
Summary Predicting the functional roles of proteins based on various genome‐wide data, such as protein–protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network‐based extension of the spatial auto‐probit model. In particular, we develop a hierarchical Bayesian probit‐based framework for modeling binary network‐indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein–protein association network topologies—either binary or weighted—in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein–protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method—incorporating the uncertainty in negative labels among the training data—can yield nontrivial improvements in predictive accuracy.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here