Premium
STATISTICAL ANALYSIS OF HETEROZYGOSITY DATA: INDEPENDENT SAMPLE COMPARISONS
Author(s) -
Archie James W.
Publication year - 1985
Publication title -
evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.84
H-Index - 199
eISSN - 1558-5646
pISSN - 0014-3820
DOI - 10.1111/j.1558-5646.1985.tb00399.x
Subject(s) - loss of heterozygosity , biology , sample size determination , statistical power , genetics , null hypothesis , statistics , population , statistical hypothesis testing , allele frequency , evolutionary biology , allele , mathematics , demography , gene , sociology
The distribution of mean heterozygosities under an infinite allele model with constant mutation rate was examined through simulation studies. It was found that, although the variance of the distribution decreases with increasing numbers of loci examined as expected, the shape of the distribution may remain skewed or bimodal. The distribution becomes symmetrical for increasing mean heterozygosity levels and numbers of loci. As a result, parametric statistical tests may not be valid for making comparisons among populations or species. Independent sample t ‐tests were examined in detail to determine the frequency of rejection of the null hypothesis when pairs of samples are drawn from populations with the same mean heterozygosity. Differing numbers of loci and levels of mean heterozygosity were examined. For mean heterozygosity levels above 7.5%, t ‐tests provide the proper rejection rate, with as few as five loci. When mean heterozygosity is as low as 2.5%, the t ‐test is conservative even when 40 loci are examined in each population. Independent sample t ‐tests were then examined for their power to detect true differences between populations as the degree of difference and number of loci vary. Although large differences can be found with high certainty, differences on the order of 5% heterozygosity may require that large numbers of loci (>40) be examined in order to be 80% or more certain of detecting them. In addition, it is emphasized that, for small numbers of loci (<25), the statistical detection of differences of interesting magnitude requires that relatively rare sampling events occur and that much larger differences be observed among the samples than exist for the population means. Two reasons exist for the lack of sensitivity of the test procedures. First, when mean heterozygosity levels are low, the non‐normality of the sample means is perhaps most important. Second, even when mean heterozygosity levels are high or when sample sizes are large enough so sample means are approximately normally distributed, the intrinsically high interlocus variance of heterozygosity estimates makes the tests insensitive to the presence of heterozygosity differences that might be biologically meaningful. Finally, the implications of the results of this study are discussed with regard to observed low levels of correlation between heterozygosity and other explanatory variables.