Long-read sequence assembly: a technical evaluation in barley
Author(s) -
Martin Mascher,
Thomas Wicker,
Jerry Jenkins,
Christopher Plott,
Thomas Lux,
ChuShin Koh,
Jennifer Ens,
Heidrun Gundlach,
Lori Beth Boston,
Zuzana Tulpová,
Samuel Holden,
Inmaculada HernándezPinzón,
Uwe Scholz,
Klaus Mayer,
M. Spannagl,
Curtis Pozniak,
Andrew Sharpe,
Hana Šimková,
Matthew Moscou,
Jane Grimwood,
Jeremy Schmutz,
Nils Stein
Publication year - 2021
Publication title -
the plant cell
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 5.324
H-Index - 341
eISSN - 1532-298X
pISSN - 1040-4651
DOI - 10.1093/plcell/koab077
Subject(s) - biology , genome , triticeae , hordeum vulgare , sequence (biology) , sequence assembly , hybrid genome assembly , computational biology , whole genome sequencing , reference genome , k mer , genetics , gene , botany , transcriptome , poaceae , gene expression
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom