z-logo
Premium
A naïve Bayesian classifier for identifying plant micro RNA s
Author(s) -
Douglass Stephen,
Hsu SsuWei,
Cokus Shawn,
Goldberg Robert B.,
Harada John J.,
Pellegrini Matteo
Publication year - 2016
Publication title -
the plant journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.058
H-Index - 269
eISSN - 1365-313X
pISSN - 0960-7412
DOI - 10.1111/tpj.13180
Subject(s) - biology , microrna , computational biology , bayes' theorem , classifier (uml) , bayesian probability , arabidopsis , genetics , artificial intelligence , gene , computer science , mutant
Summary Micro RNA s (mi RNA s) are important regulatory molecules in eukaryotic organisms. Existing methods for the identification of mature mi RNA sequences in plants rely extensively on the search for stem–loop structures, leading to high false negative rates. Here, we describe a probabilistic method for ranking putative plant mi RNA s using a naïve Bayes classifier and its publicly available implementation. We use a number of properties to construct the classifier, including sequence length, number of observations, existence of detectable predicted mi RNA * sequences, the distribution of nearby reads and mapping multiplicity. We apply the method to small RNA sequence data from soybean, peach, Arabidopsis and rice and provide experimental validation of several predictions in soybean. The approach performs well overall and strongly enriches for known mi RNA s over other types of sequences. By utilizing a Bayesian approach to rank putative mi RNA s, our method is able to score mi RNA s that would be eliminated by other methods, such as those that have low counts or lack detectable mi RNA * sequences. As a result, we are able to detect several soybean mi RNA candidates, including some that are 24 nucleotides long, a class that is almost universally eliminated by other methods.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here