z-logo
Premium
Building genomic infrastructure: Sequencing platinum‐standard reference‐quality genomes of all cetacean species
Author(s) -
Morin Phillip A.,
Alexander Alana,
Blaxter Mark,
Caballero Susana,
Fedrigo Olivier,
Fontaine Michael C.,
Foote Andrew D.,
Kuraku Shigehiro,
Maloney Brigid,
McCarthy Morgan L.,
McGowen Michael R.,
Mountcastle Jacquelyn,
Nery Mariana F.,
Olsen Morten Tange,
Rosel Patricia E.,
Jarvis Erich D.
Publication year - 2020
Publication title -
marine mammal science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.723
H-Index - 78
eISSN - 1748-7692
pISSN - 0824-0469
DOI - 10.1111/mms.12721
Subject(s) - library science , geography , archaeology , computer science
In 2001 it was announced that the 3.1 billion base (gigabase, Gb) human genome had been sequenced, but after 13 years of work and US$2.7 billion in cost, it was still considered to be only a draft. The initial assembly was missing over 30% of the genome and was made up of over 100,000 sequence fragments (scaffolds) with an average size of just 81,500 base pairs (bp) (International Human Genome Sequencing Consortium, 2004; Stein, 2004). As technologies improved, the draft human genome assembly has been repeatedly refined and corrected. By the time the genome assembly was published in 2004, the average length of scaffolds had increased to over 38 million bp (megabases, Mb) with only a few hundred gaps in the chromosome-length scaffolds. However, the duplicated and highly repetitive regions of the human genome remained unresolved due to limitations of short-read sequencing technology that requires piecing the genome together from billions of shorter sequences. Over the last decade, as highly parallel, much less expensive, shortand long-read sequencing technologies have revolutionized genomic sequencing, thousands of individual human genomes have been sequenced, further refining the human genome assembly and characterizing its diversity. Together these genome sequences have produced a “reference-quality” human genome assembly that covers 95% of the genome with far fewer and smaller gaps compared to the initial version. Despite this vast improvement, the human genome continues to be updated and refined (v. 39, RefSeq accession GCF_1405.39). This example illustrates how all eukaryotic genome assemblies, even those of exemplar quality, are drafts, varying in sequence quality (i.e., error rate), completeness (i.e., how much of the genome is covered), how contiguous DNA sequences within scaffolds are (i.e., how many gaps), and what portions of the genome remain unresolved or incorrect. The “platinum-standard reference genome” that modern genomics strives for is distinguished from other draft assemblies by completeness, low error rates, and a high percentage of the sequences assembled into chromosome-length scaffolds (Anonymous, 2018; Rhie et al., 2020). For the remainder of this note, we use “draft” to refer to the less complete/contiguous “draftier draft” genomes and “reference-quality genomes” to refer to platinum-standard reference genomes as characterized above. Democratization of genome sequencing has yielded draft genomes across the diversity of life at a rate that was unimaginable just a few years ago. As genome assemblies have become increasingly common, titles of articles often tout “chromosome-level,” “complete,” “reference-quality,” and other adjectives to characterize the quality of a new genome sequence. These terms offer little information about the level of completion or accuracy of genome assemblies, as even chromosome-level genomes may consist of thousands to millions of sequence fragments (e.g., Fan et al., 2019), with significant amounts of missing data, assembly errors, and missing or incomplete genome annotations. Nevertheless, the utility of draft genomes has been abundantly documented, and there is no doubt that draft genomes provide sufficient data to address many biological questions. For cetaceans, highly fragmented draft genomes have been useful references for mapping data from resequenced individuals, and thus for characterization of variable markers (Morin et al., 2018), phylogenetics and comparative genomics (Arnason, Lammers, Kumar, Nilsson, & Janke, 2018; Fan et al., 2019; Foote et al., 2015; Yim et al., 2014), characterization of intraspecific variability and demographic history (Autenrieth et al., 2018; Foote et al., 2019; Foote et al., 2016; Morin et al., 2015; Received: 31 March 2020 Accepted: 20 June 2020

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here