
High‐throughput discovery of functional disordered regions: investigation of transactivation domains
Author(s) -
Ravarani Charles NJ,
Erkina Tamara Y,
De Baets Greet,
Dudman Daniel C,
Erkine Alexandre M,
Babu M Madan
Publication year - 2018
Publication title -
molecular systems biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 8.523
H-Index - 148
ISSN - 1744-4292
DOI - 10.15252/msb.20188190
Subject(s) - biology , computational biology , transactivation , intrinsically disordered proteins , encode , context (archaeology) , genome , function (biology) , sequence (biology) , transcription factor , sequence motif , proteome , genetics , dna , gene , paleontology , biochemistry
Over 40% of proteins in any eukaryotic genome encode intrinsically disordered regions ( IDR s) that do not adopt defined tertiary structures. Certain IDR s perform critical functions, but discovering them is non‐trivial as the biological context determines their function. We present IDR ‐Screen, a framework to discover functional IDR s in a high‐throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality‐conferring patterns in their protein sequence are inferred through statistical learning. Using yeast HSF 1 transcription factor‐based assay, we discovered IDR s that function as transactivation domains ( TAD s) by screening a random sequence library and a designed library consisting of variants of 13 diverse TAD s. Using machine learning, we find that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We anticipate that investigating defined sequence libraries using IDR ‐Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.