z-logo
open-access-imgOpen Access
Fast Estimation of Recombination Rates Using Topological Data Analysis
Author(s) -
Devon P. Humphreys,
Melissa McGuirl,
Miriam Miyagi,
Andrew J. Blumberg
Publication year - 2019
Publication title -
genetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.792
H-Index - 246
eISSN - 1943-2631
pISSN - 0016-6731
DOI - 10.1534/genetics.118.301565
Subject(s) - coalescent theory , inference , estimator , genome , tree (set theory) , recombination , topological data analysis , biology , computer science , data mining , topology (electrical circuits) , genetics , algorithm , mathematics , phylogenetic tree , statistics , artificial intelligence , gene , combinatorics
Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (β1) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call ψ, with a natural connection to coalescent models, and present novel arguments relating β1 to population genetic models. Using simulations, we show that ψ and β1 are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE’s efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom