Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics
Author(s) -
Xiang Ji,
Zhenyu Zhang,
Andrew J. Holbrook,
Akihiko Nishimura,
Guy Baele,
Andrew Rambaut,
Philippe Lemey,
Marc A. Suchard
Publication year - 2020
Publication title -
molecular biology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.637
H-Index - 218
eISSN - 1537-1719
pISSN - 0737-4038
DOI - 10.1093/molbev/msaa130
Subject(s) - phylogenetic tree , biology , inference , computational phylogenetics , bottleneck , phylogenetics , bayesian probability , tree (set theory) , statistical inference , algorithm , sequence (biology) , bayesian inference , phylogenetic network , evolutionary biology , mathematics , statistics , computer science , combinatorics , genetics , artificial intelligence , gene , embedded system
Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom