
The Yule Approximation for the Site Frequency Spectrum after a Selective Sweep
Author(s) -
Sebastian Bossert,
Peter Pfaffelhuber
Publication year - 2013
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0081738
Subject(s) - allele frequency , fixation (population genetics) , natural selection , selection (genetic algorithm) , population , frequency dependent selection , selective sweep , biology , minor allele frequency , genetics , neutral theory of molecular evolution , evolutionary biology , allele , computer science , artificial intelligence , gene , demography , sociology , haplotype
In the area of evolutionary theory, a key question is which portions of the genome of a species are targets of natural selection. Genetic hitchhiking is a theoretical concept that has helped to identify various such targets in natural populations. In the presence of recombination, a severe reduction in sequence diversity is expected around a strongly beneficial allele. The site frequency spectrum is an important tool in genome scans for selection and is composed of the numbers, whereis the number of single nucleotide polymorphisms (SNPs) present infromindividuals. Previous work has shown that both the number of low- and high-frequency variants are elevated relative to neutral evolution when a strongly beneficial allele fixes. Here, we follow a recent investigation of genetic hitchhiking using a marked Yule process to obtain an analytical prediction of the site frequency spectrum in a panmictic population at the time of fixation of a highly beneficial mutation. We combine standard results from the neutral case with the effects of a selective sweep. As simulations show, the resulting formula produces predictions that are more accurate than previous approaches for the whole frequency spectrum. In particular, the formula correctly predicts the elevation of low- and high-frequency variants and is significantly more accurate than previously derived formulas for intermediate frequency variants.