Evolutionary-scale prediction of atomic-level protein structure with a language model
Author(s) -
Zeming Lin,
Halil Akin,
Roshan Rao,
Brian Hie,
Zhongkai Zhu,
Wenting Lu,
Nikita Smetanin,
Robert Verkuil,
Ori Kabeli,
Yaniv Shmueli,
Allan dos Santos Costa,
Maryam Fazel-Zarandi,
Tom Sercu,
Salvatore Candido,
Alexander Rives
Publication year - 2023
Publication title -
science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 12.556
H-Index - 1186
eISSN - 1095-9203
pISSN - 0036-8075
DOI - 10.1126/science.ade2574
Subject(s) - metagenomics , computer science , inference , protein structure prediction , construct (python library) , sequence (biology) , protein structure , scale (ratio) , artificial intelligence , machine learning , computational biology , biology , genetics , geography , cartography , biochemistry , gene , programming language
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom