
Bayesian parameter estimation for automatic annotation of gene functions using observational data and phylogenetic trees
Author(s) -
George G. Vega Yon,
Duncan C. Thomas,
John L. Morrison,
Huaiyu Mi,
Paul D. Thomas,
Paul Marjoram
Publication year - 2021
Publication title -
plos computational biology/plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1007948
Subject(s) - phylogenetic tree , markov chain monte carlo , bayesian probability , computer science , annotation , bayes' theorem , function (biology) , markov chain , machine learning , artificial intelligence , data mining , computational biology , biology , gene , evolutionary biology , genetics
Gene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge. We have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out cross-validation, and we further validated some of the predictions in the experimental scientific literature.