How to Infer Relative Fitness from a Sample of Genomic Sequences | Zendy

Adel Dayarian | Zendy; Boris I. Shraiman | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

How to Infer Relative Fitness from a Sample of Genomic Sequences

Author(s) -

Adel Dayarian,

Boris I. Shraiman

Publication year - 2014

Publication title -

genetics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.792

H-Index - 246

eISSN - 1943-2631

pISSN - 0016-6731

DOI - 10.1534/genetics.113.160986

Subject(s) - biology , selection (genetic algorithm) , coalescent theory , genetic fitness , evolutionary biology , ranking (information retrieval) , natural selection , population , inference , fitness landscape , mutation rate , genetics , statistics , mathematics , computer science , artificial intelligence , phylogenetics , gene , demography , sociology

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research