z-logo
open-access-imgOpen Access
MSTRAT: An Algorithm for Building Germ Plasm Core Collections by Maximizing Allelic or Phenotypic Richness
Author(s) -
Brigitte Gouesnard,
Thomas Bataillon,
G. Decoux,
C. Rozale,
Daniel J. Schoen,
Jacques L David
Publication year - 2001
Publication title -
journal of heredity
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 92
eISSN - 1471-8505
pISSN - 0022-1503
DOI - 10.1093/jhered/92.1.93
Subject(s) - biology , germ plasm , allele , germ , phenotype , core (optical fiber) , species richness , genetics , evolutionary biology , computational biology , computer science , paleontology , gene , microbiology and biotechnology , telecommunications
A core collection is a subsample of a larger germ plasm collection that contains, with a minimum of repetitiveness, the maximum possible genetic diversity of the species in question (Frankel 1984; Frankel and Brown 1984). Brown (1989a) argued that before creating the core collection, the larger collection should first be hierarchically stratified into groups of accessions that share common characters or that originate from similar ecological and geographic regions. Such a stratification could be based on passport data, knowledge of the structure of the gene pool, or both. Accessions are then drawn from each group. Several sampling strategies are used to determine how to allocate sampling effort across groups (Brown 1989b). In the absence of detailed genetic data about the individuals within the groups, all groups can be represented evenly, or in proportion to their group size, or in proportion to the logarithm of their group size (refereed as the C-, P-, and Lstrategies, respectively) (Brown 1989b). An increasing number of germ plasm collections are being genotyped for marker loci such as allozymes, restriction fragment length polymorphisms (RFLPs), and random amplified polymorphic DNA (RAPD). Schoen and Brown (1993) proposed two strategies that can use marker diversity to allocate sampling effort for the construction of the core collection. The H strategy seeks to maximize the total number of alleles in the core collection by sampling accessions from groups in proportion to their within-group genetic diversity. Such an approach assumes that the sampled alleles follow Ewens (1972) sampling theory for neutral alleles, though the approach is robust to several types of departures from this assumption (Brown and Schoen 1994). Schoen and Brown (1993) formulated an alternative strategy, the so-called M (or maximization) strategy which does not necessarily rely upon stratified sampling. The M strategy examines all possible core collections and singles out those that maximize the number of observed alleles at the marker loci. These can then be chosen as final candidates for the core. The expected superiority of this marker-based method is based on the correlation between observed allelic richness at the marker loci and allelic richness on other loci (hereafter referred as to as ‘‘target loci’’). Such a correlation (or linkage disequilibrium between marker and target alleles) is expected on theoretical grounds either because of (1) shared coancestry of populations, (2) the mating system of the species considered, or (3) episodes of selection whereby selected (target) and neutral (marker) alleles become associated through hitchhiking. Monte Carlo simulations of germ plasm collection and sampling using several marker based sampling strategies have shown that the M strategy performs well when the accessions come from populations with restricted gene flow or when the accessions are predominantly selfing (Bataillon et al. 1996). While it was initially based on variation at marker loci, the M strategy can be extended to the qualitative and quantitative variables. For quantitative variables, the continuous distribution can be broken into a series of discrete classes. Each accession then belongs to one or several classes for this quantitative variable, depending on the value of the individuals comprising the accession. For each qualitative variable, the number of classes is determined by the possible values taken by the variable in question. For example, if the variable of disease resistance is coded as either resistant or susceptible, there would be two classes. Richness of a collection of accessions for such a qualitative variable is defined as the number of classes represented among the accessions. Then when considering several variables corresponding to several traits and/or marker loci, the total richness is defined as the sum richness values across variables. The independent contributions of each variable to the sum may be weighted by the importance of the variable; for instance, a given variable may be an important trait or a locus for which allelic variation is desired. The MSTRAT software implements a generalized version of the M strategy (as discussed above), and helps the user to define the size of the core collection to be sampled, as well as the choice of the type of genetic richness to be maximized in the core collection. The software also allows the user to investigate how much genetic richness has been retained for variables that were not used in the sampling of the core collection.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom