z-logo
open-access-imgOpen Access
Comment on "A comprehensive overview and evaluation of circular RNA detection tools"
Author(s) -
Chia-Ying Chen,
TreesJuen Chuang
Publication year - 2019
Publication title -
plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1006158
Subject(s) - computational biology , circular rna , rna , computer science , biology , genetics , gene
A recent paper published in PLOS Computational Biology [1] provided a comprehensive evaluation of various circular RNA (circRNA)-detection tools. The authors compared 11 different circRNA-detection tools using four different datasets, including three simulated datasets (positive, background, and mixed datasets) and one real dataset. Since the advent of highthroughput next-generation sequencing technology, dozens of computational tools have been developed and used to successfully detect thousands of circRNAs in a diverse range of species. However, there are great discrepancies in the results obtained using different tools [2–7], and systematic evaluations of their performance have not been available. Indeed, the cited work has provided a useful guideline for researchers engaged in circRNA studies. Nevertheless, it seems inappropriate to use all CircBase-deposited circRNA candidates (14,689 events) identified in silico from RNA-seq data of HeLa cells [8] as the positive dataset. The qualification of the 14,689 candidates requires further evaluation. We suggest that three main confounding factors, which may affect the fairness of the evaluation of circRNA-detection tools, should be considered. First, it has been shown that non-co-linear (NCL) junctions (including circRNA and transspliced RNA junctions) that do not match annotated exon boundaries tend to be unreliable and are more likely to stem from mis-splicing [9–12], although we cannot eliminate the possibility that a few true backspliced junctions indeed originate from unannotated gene loci. Since circRNA candidates are regarded to be less or more reliable if their normalized read counts are depleted or enriched after RNase R treatment, respectively [13], we reexamined the circRNA candidates detected on the HeLa RNase R-treated and untreated samples (the circRNA candidates and the corresponding read counts were downloaded from the cited study). Of the circRNA candidates with unannotated exon boundaries, we can find that 50%~100% of them were “completely” depleted (not detected) after RNase R treatment, whereas only <8% of them were “significantly” enriched (i.e., 5-fold increase in normalized read count) after RNase R treatment (Fig 1). This result revealed that the candidates with unannotated exon boundaries are more likely to be false calls. Thus, we suggest that the CircBase circRNA candidates with unannotated exon boundaries (1,046 events; Table 1) should be excluded from the positive dataset. At least, since circRNA junctions were observed to be predominantly located at canonical splice sites [14–16], the candidates with junctions that have not canonical splice site sequences (GT-AG, GC-AG, or AT-AC) should be removed (778 events; Table 1). Second, ambiguous alignments originating from repetitive sequences or paralogous genes often result in false positive circRNA detection. In CircBase, most circRNA candidates were identified by find_circ [8]. It has been reported that some of find_circ-identified candidates were mis-predicted from paralogous genes [17]. Therefore, the factor of alignment ambiguity should be considered when using CircBase circRNAs as true positives. To this end, we concatenated the exonic sequence flanking the circRNA junction (within -100 nucleotides

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom