AF2BIND is a deep learning method for the prediction of ligand binding sites on proteins.
AF2BIND builds on the capacity of AlphaFold2 (AF2) by adapting the pairwise representation of the AF2 model. The AF2BIND was trained to accurately identify amino acid residues in a query protein that would bind to small-molecule ligands.
By building upon AF2's capabilities, AF2BIND takes advantage of its highly accurate structural predictions to further enhance ligand binding site identification.
Traditionally, ligand binding sites in target proteins are typically predicted by superposing the structure of the target protein on the structure of a related homologue with a known bound ligand. Essentially, the binding site of the known complex is implied based on the structural similarity. However, this approach fails when there is no known homologous structure with a bound ligand to be used as a template for the query protein.
This traditional method has significant limitations, especially in the context of novel proteins or those lacking well-characterized homologues. AF2BIND addresses this gap by providing a method that does not rely on pre-existing templates, thus broadening the scope of its applicability.
Unlike other methods for the same task, AF2BIND is a logistic regression model directly trained on the pairwise representation of AF2 that does not rely on multiple sequence alignments, homology models, or prior knowledge of the true ligand for a target protein. The model was trained on labeled non-redundant protein-ligand complex structures from the PDB.
This training approach ensures that AF2BIND can generalize across a wide variety of protein structures, making it a versatile tool in ligand binding site prediction. By eschewing the need for homology models and multiple sequence alignments, AF2BIND streamlines the prediction process and reduces computational overhead.
The authors compared the suitability of single representation features from various models including AF2, ESM2, and ESM-IF1. This entails feature representation for either the sequence or structure of the target protein without bait amino acids. The pairwise representation of AF2 was found to be the most suitable for the binding site prediction task.
This comparative analysis highlights the importance of selecting the right feature representation for accurate binding site prediction. The superior performance of AF2's pairwise representation underscores its robustness and reliability in this application.
AF2BIND takes as input the amino acid sequence of the target protein, its backbone structure, and the 20 individual canonical amino acids as baits functioning as surrogates for a small-molecule ligand.
By using canonical amino acids as baits, AF2BIND cleverly mimics the presence of a ligand, allowing the model to predict potential binding sites with greater accuracy. This approach enhances the model's ability to identify residues that are likely to interact with actual small-molecule ligands.
The prediction output by AF2BIND is a probability score, P(bind), for each residue of the target protein indicating the likelihood of the residue to be a ligand-binding residue.
This probabilistic output provides a nuanced understanding of binding site likelihoods, enabling researchers to prioritize residues for further investigation or validation.
In benchmarking tests involving GPCRs and bromodomains, AF2BIND accurately predicted the binding site residues for the test proteins. Notably, AF2BIND assigns varying probability scores to the predicted residues, offering an inherent ranking of which residues might engage ligands more.
These benchmarking results demonstrate AF2BIND's practical effectiveness in real-world scenarios, particularly in challenging protein classes like GPCRs and bromodomains. The inherent ranking system adds an extra layer of utility, guiding researchers towards the most promising binding sites.
AF2BIND offers a way to identify potential ligand binding sites in target proteins with or without ligand-bound homologous structures. Could it also facilitate the identification of cryptic pockets in drug targets?
The potential of AF2BIND to identify cryptic pockets—hidden binding sites that are not apparent in the static structure of proteins—could have significant implications for drug discovery. By uncovering these elusive sites, AF2BIND may open new avenues for therapeutic intervention, particularly in targets previously deemed undruggable.
Open Questions:
AF2BIND exemplifies the fusion of deep learning and structural biology to address a critical challenge in drug discovery. Its ability to predict ligand binding sites without relying on homologous structures marks a significant advancement in the field. The method's robustness, versatility, and potential to uncover cryptic pockets make it a valuable tool for researchers and drug developers alike.
Paper: AF2BIND: Predicting ligand-binding sites using the pair representation of AlphaFold2
Code: https://github.com/sokrypton/af2bind
Notebook: AF2BIND: Prediction of ligand-binding sites using AlphaFold2