Blog | Genophore

AFsample2: Sampling the Diverse Conformational Landscape of Proteins By Generative Models

Written by Genophore | Jun 8, 2024 8:14:00 PM

AFsample2 is a generative model based on AlphaFold2 (AF2), capable of predicting multiple conformations of protein structures from sequence. AFsample2 achieves this by introducing more noise to the inference step of the AF2 neural network.

The ability to predict biologically relevant ensembles of protein structures would not only facilitate broader understanding of biological processes but also enable deeper insights into disease mechanisms, opening up new opportunities for targeted drug development.

To achieve improved diversity of structures predicted by AF2, AFsample2 randomly masks columns in the MSA supplied to AF2 to by debias the model from the co-evolutionary reliance of its prediction process. In other words, AFsample2 reduces the constraints imposed by co-evolutionary signals in input MSAs. This favors the prediction of alternative structural states of query sequences as the breakage in covariance signals forces the network to arrive at varying solutions.

The column masking approach employed in AFsample2 shares some similarity with that utilized in SPEACH_AF. However, it differs in that SPEACH_AF introduces a sliding window of alanines (i.e. alanine scanning) at specific columns informed by prior knowledge of interacting residues based on existing structural information or contacts in generated models. AFsample2 does not rely on the need for such prior knowledge.


In a benchmark involving the open-closed conformations data sets, AFsample2 enabled the prediction of alternative states for 17 out of 23 cases, without loss of preference for the dominant end-state. For membrane protein transporters, AFsample2 achieved improved alternate state predictions for 12 of 16 test cases.

The improved sampling by AFsample2 also enhanced the TM-score of prediction of end state conformations relative to experimental structures, improving previous predictions with 0.58 scores to 0.98. Further, AFsample2 improved the prediction and diversity of intermediate states by 70 % compared to AF2.

Compared to other methods, including AFcluster, the quality of models generated by AFsample2 were significantly better.

💪 While AFsample2 predicts protein ensembles, the model also offers a novel way to select single alternate end-state structures from the generated conformations. This approach does not depend on the need for experimental reference structure and follows a three-stage process involving:

1️⃣ Calculating the similarity of each conformation in the ensemble to the best model,
2️⃣ Confidence screening to filter out models below a certain threshold, and
3️⃣ Extremity selection to identify the model (alternative state) that is furthest from the most confident model.

 

Open Questions and Future Directions:

  1. Enhanced Structural Diversity:
    • How can AFsample2's approach be further refined to predict an even broader range of conformational states?
    • Are there additional techniques that can be integrated with AFsample2 to improve the prediction of rare or transient conformations?
  2. Integration with Experimental Data:
    • How can AFsample2 be combined with experimental techniques such as cryo-EM or NMR to validate and refine its predicted ensembles?
    • What are the potential benefits and challenges of using AFsample2 in conjunction with high-throughput experimental screening methods?
  3. Applications in Drug Discovery:
    • How can AFsample2 be leveraged to identify novel druggable conformations and binding sites that are not apparent in static protein structures?
    • What are the implications of using AFsample2 in the early stages of drug discovery, particularly in virtual screening and lead optimization?
  4. Performance Across Diverse Protein Classes:
    • How well does AFsample2 perform across a wider range of protein families, including those with highly dynamic or disordered regions?
    • Can AFsample2's methodology be adapted to predict conformations for larger macromolecular complexes or multi-domain proteins?
  5. Usability and Accessibility:
    • What steps can be taken to make AFsample2 more user-friendly and accessible to researchers with varying levels of expertise in computational biology?
    • How can AFsample2 be integrated into existing computational pipelines to streamline its use in structural biology research?

By addressing these questions, the scientific community can continue to build on the advancements presented by AFsample2, pushing the boundaries of what is possible in protein structure prediction and drug discovery. The ongoing development and refinement of AFsample2 hold the promise of transforming our understanding of protein dynamics and enabling new therapeutic interventions.

Conclusion:

AFsample2 represents a significant advancement in the prediction of diverse protein conformations from sequence data. By introducing noise during the inference step of the AlphaFold2 neural network, AFsample2 overcomes the limitations imposed by co-evolutionary signals in input MSAs. This allows for the prediction of biologically relevant ensembles of protein structures, which can facilitate a deeper understanding of biological processes and disease mechanisms, and open up new opportunities for targeted drug development. The improved sampling capabilities of AFsample2 not only enhance the accuracy of end-state conformations but also significantly improve the diversity and prediction of intermediate states, positioning it as a powerful tool in the field of structural biology.

References:

Paper: AFsample2: Predicting multiple conformations and ensembles with AlphaFold2

Code: https://github.com/iamysk/AFsample2