Bryant et al. recently published Umol, an AI-based method that predicts structures of protein-ligand complexes from sequence information only.
Umol generates structures of protein-ligand complexes using a strategy that simultaneously folds the protein and the ligand from amino acid sequence and SMILES string only. Umol extends the EvoFormer module of the AlphaFold2 structure prediction network but with no template used as input.
Umol also shares some similarity to RosettaFold All-Atom (RFAA) except that the former does not include a 3D track, template structures, nor crystallographic ligand data as input. Notably, Umol also has the ability to generate protein-ligand conformation at a specified receptor binding pocket when known. Thus, Umol offers two modes: Umol for blind prediction, and Umol-pocket for pocket-directed.
Prediction of the binding conformation of ligands, including drug molecules, is an essential step in the drug discovery and development pipeline. This is typically achieved with computational molecular docking tools, providing a method to quickly evaluate and screen candidate drug molecules to be advanced for experimental assessment.
Conventional docking methods suffer from multiple limitations. One, there's a need for high-quality experimentally determined receptor (protein) structure. Second, in contrast to the highly dynamic nature of proteins, most of the industry-standard molecular docking tools often handle the receptor as rigid, or sometimes partially rigid, entities. Hence, the input receptor needs to be of high-quality and close to the physiological state as much as possible for successful docking.
Experimental structures are only available for a limited set of proteins, and computationally predicted structures are used in cases with no PDB structures. While recent development in AI-based protein structure prediction has produced methods that are able to predict the structures of proteins with accuracy, predicted structures often do not perform well for molecular docking calculations with success rate decreasing by half.
The figure showcases the performance of Umol in predicting the binding conformation of ligands within protein structures. The left two columns display predictions using the Umol method, while the right two columns illustrate the results from Umol-pocket, which directs predictions to a known receptor binding pocket. The highlighted sections demonstrate the precision of ligand placement with corresponding LRMSD values, underscoring the enhanced accuracy in structure prediction by Umol-pocket compared to the standard Umol approach.
Since Umol does not require an input structure, there are no limitations on the flexibility of either the receptor or the ligand. Thus, it models the all-atom structure of the receptor and ligand flexibly. Also, Umol generates receptor and ligand plDDT scores as confidence metrics for quality evaluation and ranking of predicted models.
In a benchmarking test on 428 diverse protein-ligand complexes, Umol (blind) achieved a success rate (ligand RMSD less than 2 Å) of 28 %, and Umol-pocket achieved 45 %, compared to 24 % for NeuralPLexer1 and 42 % for RFAA (with template information provided). Although the conventional method AutoDock Vina had a success rate of 52 %, it does require not only an experimental holo-protein structure but also a target area for docking the ligand.
At a cut-off of 2.35 Å ligand RMSD, Umol-pocket outperforms all the other methods. At a cut-off of 3 Å, Umol had a success rate of 69 % compared to 58 % for Vina.
Umol ligand plDDT scores also showed strong negative correlation with ligand binding affinity, with a Pearson correlation of up to -0.77. For a set of 45 held-out targets, the median affinity (Kd) is 30 nM for plDDT > 70, while the affinity is > 500 for plDDT < 60. This indicates that the ligand plDDT could reliably be used as a measure of ligand affinity.
While conventional energy-based methods still outperform Umol when high-quality experimental structures are available, Umol represents an important advance in AI-based protein-ligand structure prediction from sequence information only.
Resources
Paper: https://www.nature.com/articles/s41467-024-48837-6
GitHub: https://github.com/patrickbryant1/Umol
Colab Notebook: https://colab.research.google.com/github/patrickbryant1/Umol/blob/master/Umol.ipynb