A remarkable announcement made on November the 30th by Google subsidiary DeepMind declared the famous protein folding problem which has stumped researchers for the past 50 years to be solved. The solution involves the use of a deep learning algorithm, named AlphaFold 2, to predict folded protein structures with an accuracy score of 92.4 GDT.

In 1972, Nobelist Christian B. Anfinsen hypothesized that a protein's 3-dimensional structure would be determinable from its 1-dimensional amino acid sequence. The act of guessing what protein structure would form given its amino acid sequence became known as the protein folding problem.

"We have been stuck on this one problem – how do proteins fold up – for nearly 50 years. To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts, wondering if we’d ever get there, is a very special moment."

- Professor John Moult, Co-Founder and Chair of CASP, University of Maryland

Traditionally, solving the protein folding problem involved the use of high-end experimental equipment such as X-ray crystallography. These processes were limited in scale by budget and trial and error techniques.

Advances in scientific computing have unleashed a new wave of researchers applying supercomputing and deep learning to the protein folding problem.

In December of 1999, IBM announced an investment of USD 100 million into the newly conceived BlueGene project, an attempt to build supercomputer clusters that can reach petaFLOP speed which would aid researchers in running molecular simulations. These simulations would offer scientists a new way to study molecular movements of protein misfolding.

DeepMind has been working on the AlphaFold project for over 4 years. Instead of attempting a rather "brute force" simulation approach, DeepMind has invested in deep learning techniques such as gradient descent in hope that these algorithms would be able to accurately predict 3D structures based upon amino acid sequences. The investment has seemed to pay off.

During the 14th installment of the biennial CASP (Critical Assessment of Techniques for Protein Structure Prediction), AlphaFold2 blew out previous records by achieving a median of 92.4 GDT. The GDT (Global Distance Test), measures the accuracy of the protein folding prediction using a scale from 0 to 100. Any score above 90 GDT is considered to be game-changing by researchers.

AlphaFold 2 is a next-generation deep learning algorithm trained using 128 of Google's TPUv3 cores. It employs a distinct architecture that allows it to outperforms AlphaFold 1 while using similar training data. AlphaFold 1 was previously tested in 2018 at CASP13 and placed first by accurately predicting 25 out of the 43 tests.

Researchers hope advancements in protein folding will unlock a new generation of advancements in the race to solve protein misfolding diseases such as Alzheimer's. The true extent of AlphaFold's contributions to scientific discovery is only starting to take shape.

Sources and more reading

AlphaFold: a solution to a 50-year-old grand challenge in biology
In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP) assessment. This breakthrough demonstrates the impact AI can have on sci…
AlphaFold: Using AI for scientific discovery
Our Nature paper describes AlphaFold, a system that generates 3D models of proteins that are far more accurate than any that have come before.
IBM100 - Blue Gene
Protein modeling with Blue Gene/L
The Blue Gene/L supercomputer provides scientists with the cutting-edge computing power and complex data-visualization tools they need to stay at the forefront of their disciplines. Learn how this technology lets computational molecular biologists create protein folding and misfolding simulations to…