Premium
Computational challenges in combinatorial library design for protein engineering
Author(s) -
Moore Gregory L.,
Maranas Costas D.
Publication year - 2004
Publication title -
aiche journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.958
H-Index - 167
eISSN - 1547-5905
pISSN - 0001-1541
DOI - 10.1002/aic.10025
Subject(s) - library science , citation , state (computer science) , engineering , computer science , algorithm
Through the processes of natural selection and cooption, nature has crafted an astounding array of proteins with a remarkable repertoire ranging from catalysis, signaling, recognition and regulation to compartmentalization and repair. Despite this plethora of functionalities and exquisite specialization, many biotechnological tasks require proteins to operate under conditions that were not selected for in nature, such as enhanced thermostability, altered substrate specificity, different cofactor (i.e., NADH, ATP, etc.) dependence, nonaqueous environments and, often, combinations of the above. Unlike many of the systems engineered by people, proteins through evolution had to acquire the inherent ability to change and assume over time subtly, or even dramatically, different roles in living organisms. This amazing plasticity has enabled bioengineers to design or more often redesign proteins more attuned to specific tasks. Protein engineering, however, remains a formidable challenge. Proteins are much larger (i.e., over 50 residues) than nonbiological catalysts, and exhibit complex networks of dynamic interaction necessary for function. Given the residue composition of a protein, the task of de novo identifying its three-dimensional (3-D) structure is nontrivial and only limited successes (Bradley et al., 2003) are currently available. On top of this, even complete structure resolution does not mean that function is always truly elucidated. In many cases, functionality and nonfunctionality are separated by differences of only fractions of Angstroms in the position of certain key atoms, an accuracy threshold well beyond the current modeling state-of-the-art. These daunting challenges have led to protein engineering paradigms that involve the synthesis and subsequent screening of multiple protein candidates (from tens to billions) as a way of hedging against the imprecise knowledge of sequence-structure-function relations. This juxtaposition of repeated library generation and screening has emerged as the directed evolution design paradigm. Directed evolution methods mimic the process of Darwinian evolution and selection to produce proteins or even entire metabolic pathways with improved properties. These methods (see Figure 1) typically begin with the infusion of diversity into a small set of parental nucleotide sequences through mutagenesis and/or DNA recombination. Correspondence concerning this article should be addressed to C. D. Maranas at costas@psu.edu. G. L. Moore’s e-mail address is glm113@psu.edu.