Open Access
Neural networks to learn protein sequence–function relationships from deep mutational scanning data
Author(s) -
Sam Gelman,
Sarah A Fahlberg,
Pete Heinzelman,
Philip A. Romero,
Anthony Gitter
Publication year - 2021
Publication title -
proceedings of the national academy of sciences of the united states of america
Language(s) - English
Resource type - Journals
eISSN - 1091-6490
pISSN - 0027-8424
DOI - 10.1073/pnas.2104878118
Subject(s) - sequence (biology) , artificial intelligence , protein function prediction , computer science , protein sequencing , deep learning , sequence space , sequence learning , computational biology , convolutional neural network , protein design , sequence motif , artificial neural network , function (biology) , protein structure , machine learning , peptide sequence , biology , protein function , genetics , mathematics , gene , biochemistry , pure mathematics , banach space
Significance Understanding the relationship between protein sequence and function is necessary to design new and useful proteins with applications in bioenergy, medicine, and agriculture. The mapping from sequence to function is tremendously complex because it involves thousands of molecular interactions that are coupled over multiple lengths and timescales. We show that neural networks can learn the sequence–function mapping from large protein datasets. Neural networks are appealing for this task because they can learn complicated relationships from data, make few assumptions about the nature of the sequence–function relationship, and can learn general rules that apply across the length of the protein sequence. We demonstrate that learned models can be applied to design new proteins with properties that exceed natural sequences.