z-logo
Premium
A Bayesian method for finding regulatory segments in DNA
Author(s) -
Crowley Evelyn M.
Publication year - 2001
Publication title -
biopolymers
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.556
H-Index - 125
eISSN - 1097-0282
pISSN - 0006-3525
DOI - 10.1002/1097-0282(200102)58:2<165::aid-bip50>3.0.co;2-o
Subject(s) - sequence (biology) , dna , computational biology , gene , regulatory sequence , dna sequencing , hidden markov model , human genome , markov chain , markov chain monte carlo , genetics , sequence analysis , base pair , bayesian probability , genome , biology , computer science , regulation of gene expression , artificial intelligence , machine learning
A goal of the human genome project is to determine the entire sequence of DNA (3 × 10 9 base pairs) found in chromosomes. The massive amounts of data produced by this project require interpretation. A Bayesian model is developed for locating regulatory regions in a DNA sequence. Regulatory regions are areas of DNA to which specific proteins bind and control whether or not a gene is transcribed to produce templates for protein synthesis. Each human cell contains the same DNA sequence. Thus the particular function of different cells is determined by the genes that are transcribed in that cell. A Hidden Markov chain is used to model whether a small interval of the DNA is in a regulatory region or not. This can be regarded as a changepoint problem where the changepoints are the start of a regulatory or nonregulatory region. The data consists of protein‐binding elements, which are short subsequences, or “words,” in the DNA sequence. Although these words can occur anywhere in the sequence, a larger number are expected in regulatory regions. Therefore, regulatory regions are detected by locating clusters of words. For a particular DNA sequence, the model automatically selects those words that best predict regions of interest. Markov chain Monte Carlo methods are used to explore the posterior distribution of the Hidden Markov chain. The model is tested by means of simulations, and applied to several DNA sequences. © 2001 John Wiley & Sons, Inc. Biopoly 58: 165–174, 2001

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here