Premium
Detection and correction of assembly errors of rice N ipponbare reference sequence
Author(s) -
Deng Y.,
Pan Y.,
Luo M.
Publication year - 2014
Publication title -
plant biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.871
H-Index - 87
eISSN - 1438-8677
pISSN - 1435-8603
DOI - 10.1111/plb.12090
Subject(s) - refseq , biology , genome , genetics , reference genome , sequence assembly , computational biology , whole genome sequencing , comparative genomics , sequence (biology) , genomics , gene , gene expression , transcriptome
A complete and high‐quality genome reference sequence of an organism provides a solid foundation for a wide research community and determines the outcomes of relevant genomic, genetic, molecular and evolutionary research. Rice is an important food crop and a model plant for grasses, and therefore was the first chosen crop plant for whole genome sequencing. The genome of the japonica representative rice variety, N ipponbare, was sequenced using a gold standard, map‐based clone‐by‐clone strategy. However, although the N ipponbare reference sequence ( RefSeq ) has the best quality for existing crop genome sequences, it still contains many assembly errors and gaps. To improve the N ipponbare RefSeq , first a robust method is required to detect the hidden assembly errors. Through alignments between BAC ‐end sequences ( BESs ) embedded in the N ipponbare bacterial artificial chromosome ( BAC ) physical map and the N ipponbare RefSeq , we detected locations on the N ipponbare RefSeq that were inversely matched with BES s and could therefore be candidates for spurious inversions of assembly. We performed further analysis of five potential locations and confirmed assembly errors at those locations; four of them, two on chr4 and two on chr11 of the N ipponbare RefSeq ( IRGSP build 5), were found to be caused by reverse repetitive sequences flanking the locations. Our approach is effective in detecting spurious inversions in the N ipponbare RefSeq and can be applied for improving the sequence qualities of other genomes as well.