Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach
Author(s) -
Xuan Zhang,
Pengyao Ping,
György Hutvàgner,
Michael Blumenstein,
Jinyan Li
Publication year - 2021
Publication title -
nucleic acids research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.008
H-Index - 537
eISSN - 1362-4954
pISSN - 0305-1048
DOI - 10.1093/nar/gkab610
Subject(s) - indel , biology , k mer , computational biology , error detection and correction , deep sequencing , base pair , dna sequencing , algorithm , systematic error , genetics , computer science , dna , genome , mathematics , statistics , gene , genotype , single nucleotide polymorphism
Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom