
LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
Author(s) -
Sean Cascarina,
David C. King,
Erin Osborne Nishimura,
Eric D. Ross
Publication year - 2021
Publication title -
nar genomics and bioinformatics
Language(s) - English
Resource type - Journals
ISSN - 2631-9268
DOI - 10.1093/nargab/lqab048
Subject(s) - uniprot , computational biology , feature (linguistics) , identification (biology) , computer science , amino acid , proteome , liquid crystal display , biology , sequence (biology) , artificial intelligence , bioinformatics , genetics , gene , linguistics , philosophy , botany , operating system
Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.