
Identifying transcriptional cis ‐regulatory modules in animal genomes
Author(s) -
Suryamohan Kushal,
Halfon Marc S.
Publication year - 2014
Publication title -
wiley interdisciplinary reviews: developmental biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.779
H-Index - 45
eISSN - 1759-7692
pISSN - 1759-7684
DOI - 10.1002/wdev.168
Subject(s) - computational biology , chromatin , enhancer , biology , chromatin immunoprecipitation , cis regulatory module , chia pet , histone , genome , regulatory sequence , transcription factor , promoter , gene , genetics , genomics , regulation of gene expression , nucleosome , gene expression
Gene expression is regulated through the activity of transcription factors ( TF s) and chromatin‐modifying proteins acting on specific DNA sequences, referred to as cis ‐regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis ‐regulatory modules ( CRMs ), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily identifiable sequence characteristics, and for many years were not amenable to high‐throughput discovery methods. However, the recent availability of complete genome sequences and the development of next‐generation sequencing methods have led to an explosion of both computational and empirical methods for CRM discovery in model and nonmodel organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against TF s or histone post‐translational modifications, identification of nucleosome‐depleted ‘open’ chromatin regions, or sequencing‐based high‐throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted TF ‐binding sites, and supervised machine‐learning approaches trained on known CRMs . All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false‐positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. WIREs Dev Biol 2015, 4:59–84. doi: 10.1002/wdev.168 This article is categorized under: Gene Expression and Transcriptional Hierarchies > Regulatory Mechanisms Gene Expression and Transcriptional Hierarchies > Gene Networks and Genomics Technologies > Analysis of the Transcriptome