
Haplotype inference from diploid sequence data: evaluating performance using non‐neutral MHC sequences
Author(s) -
Bos David H.,
Turner Sara M.,
Andrew DeWoody J.
Publication year - 2007
Publication title -
hereditas
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.819
H-Index - 50
eISSN - 1601-5223
pISSN - 0018-0661
DOI - 10.1111/j.2007.0018-0661.01994.x
Subject(s) - biology , ploidy , inference , major histocompatibility complex , genetics , haplotype , computational biology , population , neutral theory of molecular evolution , selection (genetic algorithm) , evolutionary biology , computer science , gene , allele , machine learning , artificial intelligence , demography , sociology
The direct sequencing of PCR products from diploid organisms is problematic because of ambiguities associated with phase inference in multi‐site heterozygotes. Several molecular methods such as cloning, SSCP, and DGGE have been developed to empirically reduce diploid sequences to their constitutive haploid components, but in theory these empirical approaches can be supplanted by analytical treatment of diploid sequences. Analytical approaches are more desirable than molecular methods because of the added time and expense required to generate molecular data. A variety of analytical methods have been developed to address this issue, but few have been rigorously evaluated with empirical data. Furthermore, they all assume that the sequences under consideration are evolving in a neutral fashion and assume a moderate number of heterozygous sites. Here, we use non‐neutral major histocompatibility complex (MHC) sequences comprised of large numbers of heterozygous sites that are under strong balancing selection to evaluate the performance of the popular Bayesian algorithm implemented by the program PHASE. Our results suggest that PHASE performs admirably with non‐neutral sequences of moderate length with numerous heterozygous sites typical of MHC class II sequences. We conclude that analytical approaches to haplotype inference have great potential in large‐scale population genetic assays, but recommend groundtruthing analytical results using empirical (molecular) approaches at the outset of population‐level analyses.