Premium
Making Functional Predictions Using Local Spatial Arrangements in the Haloacid Dehalogenase Superfamily
Author(s) -
Ruffner Lydia Armentha,
Pina Manuel,
Beuning Penny J.,
Ondrechen Mary Jo
Publication year - 2020
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.2020.34.s1.07027
Subject(s) - dehalogenase , structural genomics , computational biology , mega , phylogenetic tree , function (biology) , genome , biology , superfamily , enzyme , protein structure , gene , biochemistry , genetics , physics , astronomy
Beginning with the sequencing of the human genome in the 1990s, major efforts have been underway to expand significantly upon our current working knowledge of proteins. Between the years 2000–2015, the Protein Structure Initiative (PSI) solved 15,000+ protein structures, but most of them have unknown or uncertain biochemical function or have an incorrect assigned putative function. With all these structures, methodology must be developed to functionally annotate these proteins in order to identify potential applications, such as bioremediation and understanding cellular processes. Enzymes in the Haloacid Dehalogenase Superfamily (HADSF) possess a wide range of functions, including phosphatases important in cell membrane biosynthesis and dehalogenases that possesses the ability to detoxify and degrade halogenated compounds. The Ondrechen Research Group (ORG) at Northeastern University has successfully developed methodology that is being used to predict biochemical function for Structural Genomics (SG) proteins in the HADSF. These methods are Partial Order Optimum Likelihood (POOL) and Structurally Aligned Local Sites of Activity (SALSA). POOL is a machine learning method that uses the electrostatic features and metrics from THE oretical M icroscope A nomalous TI tration C urve S hapes (THEMATICS), ligand binding pocket and geometric features from ConCavity, and the evolutionary scores and phylogenetic trees from IN formation‐theoretic TRE e traversal for P rotein functional site ID entification (INTREPID) to make the predictions about which residues are catalytically active or otherwise important for protein function. SALSA uses the functional residue predictions for SG proteins obtained from POOL and aligns them with consensus signatures from known enzyme subfamilies according to the local spatial arrangement of predicted residues at the active site. So far for the HAD superfamily, using SALSA we have made predictions for 20 SG proteins: one dehalogenase, eight sugar phosphatases, three NagD‐like phosphatases, four P‐Type ATPases, and four soluble epoxide hydrolases. These predictions will be experimentally validated by biochemical assays to establish the function of each protein and to verify our computational approach to protein function prediction. The ability to predict computationally the biochemical function of protein structures of unknown or uncertain function adds tremendous value to genomics data. Support or Funding Information This project is funded by NSF CHE‐1305655, CHE‐1905214, and CHE‐1757078.