Premium
Toward the solution of the protein‐structure prediction problem
Author(s) -
Zhang Yang
Publication year - 2021
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.2021.35.s1.00043
Proteins are one‐dimensional chains of amino acids that can fold into unique three‐dimensional shapes in the physiological environment to perform their biological and cellular functions. Protein structure prediction aims to determine the spatial shape, i.e., the location of every atom, of a protein molecule from its amino acid sequence by computational modeling. Depending on whether homologous structures are found in the Protein Data Bank (PDB), protein structure prediction has been historically categorized into template‐based modeling (TBM) and template‐free modeling (FM, or ab initio folding). In this talk, we first review the important milestones of the last decades in computer‐based protein structure prediction and show that the problem can be solved in principle by TBM if fold‐recognition algorithms could identify the best structural templates from the PDB. Next, we discuss protein structure prediction results in the recent community‐wide blind CASP experiments, showing that new approaches combining ab initio folding and deep neural‐network contact and distance predictions, which are built on residue coevolution data from multiple sequence alignments, can result in consistent and successful folding of large proteins with complicated shapes and topologies. In particular, the end‐to‐end training system powered by attention‐based neural networks built by the DeepMind team, which enables self‐feature learning and local structural error estimation and refinement, could fold nearly all protein domains in the CASP14 experiment with 2/3 of them having an accuracy comparable to experimental solutions. These achievements essentially break through the 50‐years‐old modeling border between TBM and FM and make the success of high‐resolution structure prediction no longer dependent on the PDB library. Especially, the latter progress in CASP14 marked a solution, at least at the fold‐level, to the single‐domain protein structure prediction problem. Nevertheless, constructions of atomic‐resolution models for multi‐domain proteins and protein‐protein complexes remain challenging to the community. Given the revolutionary breakthroughs brought about recently, it is expected that the problem should be solved in the foreseeable future by integrating deep machine learning techniques and the rapid advancement of genome sequencing databases, with the aid of advanced structure assembly simulation algorithms.