Evaluating Docking: Why RMSD Isn't Enough According to PoseBusters

Deep Learning Docking: Speed and Accuracy Gains, But a Hidden Flaw

Deep learning (DL) has revolutionized many fields, including protein-ligand docking, a crucial step in drug discovery. Recent DL-based docking methods have shown impressive improvements in speed and accuracy, often surpassing classical methods when evaluated using Root Mean Square Deviation (RMSD) to the native pose.

However, a new study by Buttenschoen et al. reveals a hidden limitation in these DL methods. They introduced PoseBusters, a powerful evaluation suite that goes beyond RMSD and assesses the chemical and physical plausibility of docked ligands using RDKit, a popular cheminformatics toolkit.

Here's where things get interesting:

DL Methods Excel in Raw Accuracy (RMSD): When evaluated with just RMSD, some DL methods like DiffDock outperform classical tools like AutoDock Vina and Gold. This suggests they can position ligands very close to the native pose in space.
But They Generate Unrealistic Poses: However, PoseBusters exposes a major weakness. These DL methods often generate docked poses that violate basic chemical rules or have implausible geometries, even with good RMSD. This is likely because their training data or algorithms might not fully capture the underlying physics of ligand-protein interactions.
Classical Methods Have Built-in Safeguards: In contrast, classical methods like Vina and Gold incorporate terms in their scoring functions that penalize unrealistic ligand conformations. This acts as a safeguard against physically impossible poses, even if it might slightly decrease their raw accuracy (RMSD).

Example of a prediction improved by post-docking energy minimization. The initial Uni-Mol prediction (RMSD 2.0 Å) is depicted in white, the optimized prediction (RMSD 1.1 Å) is in pink, and the reference crystal ligand is shown in light blue. The optimization flattens the aromatic rings and shortens the leftmost bond, enabling the prediction to pass all PoseBusters checks.

The PoseBusters Benchmark:

PoseBusters takes docked ligand structures, the true ligand structure, and the protein structure as input and performs three key sets of checks:

Chemical Validity and Consistency: Ensures the docked ligand adheres to basic chemical rules and has a sensible structure.
Intramolecular Properties: Analyzes the ligand's geometry, internal strain, and energy (calculated using a force field).
Intermolecular Interactions: Checks for clashes and unrealistic interactions between the ligand and the protein.

By introducing PoseBusters, Buttenschoen et al. provide a more comprehensive evaluation method that goes beyond just spatial positioning (RMSD). This highlights a critical area for improvement in DL docking methods – ensuring they not only place ligands in the correct spot but also generate physically realistic poses. Addressing this limitation will be crucial for their continued development and real-world application in drug discovery.

pose_busters

Overall, the message is clear: While DL docking methods show promise in terms of speed and raw accuracy, they currently struggle with generating physically plausible poses. PoseBusters offers a valuable tool to address this limitation and guide the development of more robust and reliable DL docking methods for the future.

Note: AlphaFold3, released recently, demonstrates exceptional ligand docking capabilities. Evaluated using PoseBusters, it surpasses traditional docking software like AutoDock Vina and Gold, as well as other machine learning methods, by generating a significantly higher number of chemically valid poses. Read more about this exciting development in our recent post (link here).

References and Resources

Paper: PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences

GitHub: https://github.com/maabuu/posebusters

PyPI: https://pypi.org/project/posebusters

Documentation: https://posebusters.readthedocs.io/en/latest