Open Access
A unified approach for quantifying and interpreting DNA shape readout by transcription factors
Author(s) -
Rube H Tomas,
Rastogi Chaitanya,
Kribelbauer Judith F,
Bussemaker Harmen J
Publication year - 2018
Publication title -
molecular systems biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 8.523
H-Index - 148
ISSN - 1744-4292
DOI - 10.15252/msb.20177902
Subject(s) - dna , biology , computational biology , twist , biological system , transcription factor , representation (politics) , transcription (linguistics) , dna sequencing , sequence (biology) , genetics , algorithm , computer science , gene , mathematics , geometry , linguistics , philosophy , politics , political science , law
Abstract Transcription factors ( TF s) interpret DNA sequence by probing the chemical and structural properties of the nucleotide polymer. DNA shape is thought to enable a parsimonious representation of dependencies between nucleotide positions. Here, we propose a unified mathematical representation of the DNA sequence dependence of shape and TF binding, respectively, which simplifies and enhances analysis of shape readout. First, we demonstrate that linear models based on mononucleotide features alone account for 60–70% of the variance in minor groove width, roll, helix twist, and propeller twist. This explains why simple scoring matrices that ignore all dependencies between nucleotide positions can partially account for DNA shape readout by a TF . Adding dinucleotide features as sequence‐to‐shape predictors to our model, we can almost perfectly explain the shape parameters. Building on this observation, we developed a post hoc analysis method that can be used to analyze any mechanism‐agnostic protein– DNA binding model in terms of shape readout. Our insights provide an alternative strategy for using DNA shape information to enhance our understanding of how cis ‐regulatory codes are interpreted by the cellular machinery.