Prediction of missing sequences and branch lengths in phylogenomic data
Author(s) -
Diego Darriba,
Michael Weiß,
Alexandros Stamatakis
Publication year - 2016
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btv768
Subject(s) - missing data , phylogenetic tree , sequence (biology) , inference , computer science , partition (number theory) , tree (set theory) , algorithm , data mining , biology , computational biology , artificial intelligence , mathematics , gene , genetics , machine learning , combinatorics
The presence of missing data in large-scale phylogenomic datasets has negative effects on the phylogenetic inference process. One effect that is caused by alignments with missing per-gene or per-partition sequences is that the inferred phylogenies may exhibit extremely long branch lengths. We investigate if statistically predicting missing sequences for organisms by using information from genes/partitions that have data for these organisms alleviates the problem and improves phylogenetic accuracy.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom