Premium
Heterogeneous data integration by tree‐augmented naïve B ayes for protein–protein interactions prediction
Author(s) -
Lin Xiaotong,
Chen Xuewen
Publication year - 2013
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.201200326
Subject(s) - naive bayes classifier , computer science , bayes' theorem , classifier (uml) , artificial intelligence , machine learning , bayes classifier , bayesian network , bayesian probability , robustness (evolution) , data mining , pattern recognition (psychology) , computational biology , biology , support vector machine , gene , biochemistry
Most proteins execute their functions through interacting with other proteins. Thus, understanding protein–protein interactions ( PPI s) is essential to decipher biological functions in a living cell. To predict large‐scale PPIs, effective and efficient computational approaches are desirable to integrate heterogeneous data sources provided by advanced technologies. In this paper, we extend our previous work on a Bayesian classifier for human PPI predictions from model organisms, by introducing a tree‐augmented naïve Bayes ( TAN ) classifier. TAN maintains the simplicity and robustness of a naïve Bayes classifier while allows for the dependence among variables. Our empirical results show that by integrating features extracted from microarray expression measurements, Gene Ontology values, and orthologous scores, TAN achieves higher classification accuracy than the manually constructed Bayesian network classifier and naïve Bayes. For human PPI prediction, TAN obtains 88% sensitivity while keeping a reasonable 70% specificity on testing samples.