Blog | Genophore

AF-Cluster: How to predict protein structural conformations using sequence similarity clustering?

Written by Genophore | Jun 8, 2024 8:22:00 PM

Multiple-sequence alignment (MSA) is a foundational technique in bioinformatics that involves aligning three or more biological sequences, such as protein or nucleic acid sequences, to identify regions of similarity. These similarities can provide crucial insights into the functional, structural, and evolutionary relationships between the sequences. In the context of protein structure prediction, MSAs are particularly important because they enable the identification of conserved residues and motifs that are critical for maintaining the protein's structure and function. AlphaFold2 (AF2), relies heavily on MSAs to predict protein structures with remarkable accuracy. By analyzing the evolutionary information embedded in MSAs, AF2 can make highly informed predictions about the three-dimensional conformation of proteins.

Subsampling the input of multiple-sequence alignment supplied to AF2 has been shown to enable the prediction of alternative protein conformations. Wayment-Steele et al. reported an approach called AF-Cluster, which adapts AlphaFold2 (AF2) for predicting protein conformational ensembles with high confidence. This method clusters MSAs based on sequence similarity, facilitating the sampling of alternative conformations predicted by AF2.

Predicted models by AF2 are static single-state structures. However, proteins are inherently dynamic, adopting multiple conformations as they execute their biological functions. The predicted models by AF2 do not capture the dynamic, and biologically important, nature of proteins. Thus, developing methods for the prediction of multiple protein conformations with AF2-level or near-AF2-level accuracy remains an important goal.


AF-Cluster successfully predicted the two states of the metamorphic proteins KaiB, RfaH and MAD2. AF-Cluster also demonstrated high sensitivity to point mutations. The sensitivity and predictive ability of AF-Cluster was utilized to design three point mutations that would favor a particular fold-switched state of KaiB. The mutations were validated experimentally.

Further, AF-Cluster was applied to screen for alternative states in protein families not known to switch folds. This led to the discovery of a novel putative alternative state for the Mtb oxidoreductase Mpt53.

With the more recent progress made by RosettaFold All-Atom and AlphaFold 3, could more accurate prediction of alternative protein conformations be the next milestone? The AF2 database contains ~214 million single-structure predictions. It's been estimated that 0.5-4% of known proteins contain fold-switching domains. The ability to predict alternative protein conformations at scale would mean that about 1-8 million fold-switching proteins could be uncovered.

Open questions:

❓ Does AF-Cluster effectively distinguish between functionally relevant conformations and minor variations?

❓Does AF-Cluster work well for proteins of all sizes, or are there limitations for very large or small proteins?

❓How can AF-Cluster be used to aid in the development of new drugs that target specific protein conformations?

Paper: Predicting multiple conformations via sequence clustering and AlphaFold2

Code: AF-Cluster

Colab Notebook: AFcluster.ipynb