Heterogeneity in DNA Multiple Alignments: Modeling, Inference, and Applications in Motif Finding | Zendy

Chen Gong | Zendy; Zhou Qing | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Heterogeneity in DNA Multiple Alignments: Modeling, Inference, and Applications in Motif Finding

Author(s) -

Chen Gong,

Zhou Qing

Publication year - 2010

Publication title -

biometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.298

H-Index - 130

eISSN - 1541-0420

pISSN - 0006-341X

DOI - 10.1111/j.1541-0420.2009.01362.x

Subject(s) - hidden markov model , inference , dna binding site , computer science , computational biology , markov chain , bayes' theorem , gibbs sampling , motif (music) , bayesian inference , bayesian probability , sequence motif , multiple sequence alignment , sequence alignment , biology , data mining , genetics , artificial intelligence , gene , machine learning , gene expression , promoter , physics , acoustics , peptide sequence

Summary Transcription factors bind sequence‐specific sites in DNA to regulate gene transcription. Identifying transcription factor binding sites (TFBSs) is an important step for understanding gene regulation. Although sophisticated in modeling TFBSs and their combinatorial patterns, computational methods for TFBS detection and motif finding often make oversimplified homogeneous model assumptions for background sequences. Since nucleotide base composition varies across genomic regions, it is expected to be helpful for motif finding to incorporate the heterogeneity into background modeling. When sequences from multiple species are utilized, variation in evolutionary conservation violates the common assumption of an identical conservation level in multiple alignments. To handle both types of heterogeneity, we propose a generative model in which a segmented Markov chain is used to partition a multiple alignment into regions of homogeneous nucleotide base composition and a hidden Markov model (HMM) is employed to account for different conservation levels. Bayesian inference on the model is developed via Gibbs sampling with dynamic programming recursions. Simulation studies and empirical evidence from biological data sets reveal the dramatic effect of background modeling on motif finding, and demonstrate that the proposed approach is able to achieve substantial improvements over commonly used background models.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore