
Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes
Author(s) -
George Armstrong,
Kalen Cantrell,
Shi Huang,
Daniel McDonald,
Niina Haiminen,
Anna Paola Carrieri,
Qiyun Zhu,
Antonio González,
Imran McGrath,
Kristen L. Beck,
Daniel Hakim,
Aki S. Havulinna,
Guillaume Méric,
Teemu J. Niiranen,
Leo Lahti,
Veikko Salomaa,
Mohit Jain,
Michael Inouye,
Austin D. Swafford,
Ho-Cheol Kim,
Laxmi Parida,
Yoshiki Vázquez-Baeza,
Rob Knight
Publication year - 2021
Publication title -
genome research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.556
H-Index - 297
eISSN - 1549-5469
pISSN - 1088-9051
DOI - 10.1101/gr.275777.121
Subject(s) - phylogenetic diversity , phylogenetic tree , biology , metric (unit) , faith , diversity (politics) , set (abstract data type) , biodiversity , microbiome , metagenomics , scale (ratio) , evolutionary biology , phylogenetics , computational biology , computer science , bioinformatics , ecology , genetics , gene , engineering , geography , cartography , philosophy , operations management , theology , sociology , anthropology , programming language
The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.