ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements | Zendy

James Taylor | Zendy; Svitlana Tyekucheva | Zendy; David King | Zendy; Ross C. Hardison | Zendy; Webb Miller | Zendy; Francesca Chiaromonte | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements

Author(s) -

James Taylor,

Svitlana Tyekucheva,

David King,

Ross C. Hardison,

Webb Miller,

Francesca Chiaromonte

Publication year - 2006

Publication title -

genome research

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 9.556

H-Index - 297

eISSN - 1549-5469

pISSN - 1088-9051

DOI - 10.1101/gr.4537706

Subject(s) - biology , set (abstract data type) , sequence (biology) , computational biology , sequence alignment , function (biology) , constraint (computer aided design) , class (philosophy) , genome , conserved sequence , multiple sequence alignment , limiting , computer science , artificial intelligence , genetics , base sequence , gene , peptide sequence , mathematics , mechanical engineering , geometry , engineering , programming language

Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research