Recent AI Methods in Antibody Structure Prediction and De Novo Generation
Antibodies (Abs) are essential therapeutic agents with applications in various treatments, including cancer, autoimmune disorders, and viral infections. The development of these therapeutic antibodies is complex, requiring the optimization of multiple objectives such as efficacy, manufacturability, and safety. Recently, significant advancements have been made in computational antibody design, particularly through the use of machine learning and AI-based models. This blog explores four innovative methods in antibody structure prediction and de novo generation: AntiFold, IgDiff, GeoAB, and tFold.
AntiFold: Superior Structure-Based Antibody Sequence Design for Enhanced CDR Recovery
Høie et al. recently introduced AntiFold, an antibody-specific model for structure-based sequence design based on the ESM-IF1 inverse folding model. ESM-IF1 is a general-purpose protein sequence design model for given protein structures, built by Meta by training on experimentally determined protein structures and over 12 million AlphaFold2-predicted structures.
Antibodies (Abs) are an important class of therapeutic agents with broad applications in treatments ranging from cancer to autoimmune disorders and viral infections. However, the development of therapeutic Abs is a challenging protein design problem as it involves not only sequence selection but also the optimization of multiple objectives, including efficacy, ease of manufacturing, safety, and more. Computational design is an ideal approach with huge potential to generate target-specific Abs. Although AI-based computational protein design has seen significant progress recently, with models like ProteinMPNN and ESM-IF1 achieving highly accurate sequence design for a wide range of proteins, the accurate design of Abs remains elusive.
Several machine learning methods have demonstrated the ability to address some of the steps in the Ab development process, including reducing immunogenicity and aggregation. Other important tasks include the selection of sequence mutations that fit target structures and the optimization of structure-based properties such as stability and antigen binding.
The AntiFold model was built by fine-tuning ESM-IF1 on datasets of experimental Ab structures (2,074 complexes from SAbDab) augmented with predicted Ab structures (147,458 ABodyBuilder2-modeled structures from OAS). Given a target Ab structure, the model predicts optimized sequences that would fold into the target structure. Sequences designed by AntiFold, when refolded using a structure prediction model, showed high structural similarity to the expected structure.
In benchmarking tests against existing inverse folding models, including ProteinMPNN, ESM-IF1, and AbMPNN, AntiFold showed the highest sequence recovery of Ab complementarity-determining regions. When evaluated in a zero-shot antibody-antigen binding affinity test, AntiFold demonstrated better correlations than other methods. The accuracy of AntiFold was further improved when information about the target antigen was provided. AntiFold also shows promise in Ab optimization as it assigns low probabilities to residue mutations that ablate Ab-Ag binding.
It remains to be seen how such a powerful model could be integrated into experimental settings for therapeutic Ab discovery as well as multi-objective optimization using reinforcement and active learning.
IgDiff: De novo design of antibody variable domain using a diffusion model
Cutting et al. recently reported IgDiff, a diffusion model for generating backbone structures of antibody variable regions. IgDiff is based on the FrameDiff, a general protein backbone generation framework previously developed by the Jaakkola lab. The ultimate goal of protein design is to engineer proteins that fulfill specific functional characteristics. While protein design has seen impressive progress in inverse folding tasks thanks to advances in deep learning techniques, de novo generation of protein structures remains a challenging problem.
Diffusion models, following their breakthroughs in image generation, have shown incredible promise in computational protein design owing to their accuracy and speed. Thus, multiple state-of-the-art diffusion-based protein sequence and structure design methods have been developed. However, unlike other classes of proteins, the design of therapeutic antibodies is an especially more challenging task, and general protein design methods struggle to perform well.
IgDiff was developed based on FrameDiff adapted for antibody design by fine-tuning it on synthetic antibody structures. The training of IgDiff focused specifically on the complementarity-determining region (CDR) loops which are essential for defining the binding properties of Abs.
IgDiff generates backbones of the paired variable regions with a paired heavy and light chain. Structures designed by IgDiff fold were evaluated for designability (plausibility), novelty, and diversity.
IgDiff is capable of conditional and unconditional structure generation. Sequences for structures designed by IgDiff were predicted using AbMPNN, a model fine-tuned for Ab inverse folding based on ProteinMPNN. The best predicted sequence for generated Abs show <2 A RMSD (i.e. RMSD of predicted model to initial structure) for the entire Ab, with 88 % having <2A for across each Ab region, including the CDRs. All 28 designed Ab sequences selected for experimental validation all expressed and yielded enough concentration for downstream characterization.
IgDiff outperformed RFDiffusion in all evaluated design tasks, including the
- design of entire light chain for a given heavy chain
- design of the CDR loops for a given fixed heavy and light chain
- design of variable length CDRH3 loop given the remaining variable region
GeoAB: In silico method for antibody design and rationalizing affinity maturation
Lin et al. recently proposed GeoAB, a method for the realistic design and optimization of antibody (Ab) complementarity-determining regions (CDRs). GeoAB implements a co-design framework that predicts both the structure of a CDR and the optimized amino acid sequences for that structure. It achieves this by generating CDR structures, predicting mutation effects, and predicting structures of optimized CDRs based on these mutations.
Computational protein modeling and design have seen significant advancements in recent years, largely due to the development and achievements of AlphaFold2 (AF2), a deep-learning model for protein structure prediction. Several protein design methods have been inspired by or built upon the AF2 approach. While general protein design methods do not generalize well to therapeutic Ab design, there have been increasing efforts to develop Ab-specific computational protein design methods. However, many challenges still impede progress in this area, including accurate generation and prediction of Ab structures and the design of Ab sequences with target functions.
GeoAB addresses two key challenges in this field: the generation of designable and plausible Ab structures and the reliability of computational Ab affinity maturation. Computational affinity maturation involves optimizing Abs by introducing mutations into the amino acid sequences of the CDRs to enhance Ab binding affinity towards target antigens.
GeoAB comprises two primary models: GeoAB-Designer and GeoAB-Optimizer. GeoAB-Designer generates CDR structures for a given Ab. The generated structures are geometry-optimized using two modules: Geo-Initializer, a generative geometry initializer, and Geo-Refiner, a graph neural network position refiner. GeoAB-Optimizer, a structure-aware module, performs affinity maturation of the Ab by predicting the ΔΔG of mutations that optimize the target binding property. The CDR structures of selected mutants are then generated by GeoAB-Optimizer using a similar network architecture as Geo-Refiner.
GeoAB demonstrated state-of-the-art performance on several benchmark tests. In CDR refinement assessments, Geo-Refiner outperformed RefineGNN, MEAN, DyMEAN, C-RefineGNN, and C-DyMEAN. On CDR generative tasks, GeoAB-Designer outperformed DiffAB on all evaluated metrics. For mutation-based redesign, GeoAB surpassed other methods on multiple metrics, including the physics-based Rosetta ddG and FoldX, as well as the deep learning models DDG-Pred and RDE-PPI.
tFold: Fast and Accurate Prediction of Structures of Antibodies and Antibody-Antigen Complexes from Sequence
Wu et al. recently introduced tFold, a method for the prediction of structures of antibodies (Abs) and antibody-antigen (Ab-Ag) complexes.
The model utilizes an end-to-end approach for predicting 3D atomic-resolution structures of Abs and Ab-Ag complexes by employing a pre-trained large protein sequence language model. This model extracts intra-chain and inter-chain residue-residue contact information and evolutionary relationships, eliminating the need for time-consuming multiple sequence alignment searches. tFold also integrates transformer-based structure prediction methods supported by specialized flexible docking modules. These integrations were developed into models named tFold-Ab for Ab (and nanobody) structure prediction and tFold-Ag for Ab-Ag (and nanobody-Ag) complex structure prediction. Each of these components takes sequence as input and produces per-residue confidence scores.
Antibodies are essential in the adaptive immune system as they specifically recognize and neutralize antigens. The specificity of Abs toward Ags positions them as highly promising therapeutics. Thus, they have been of utmost interest to the scientific community, with the prediction of structures of Ab-Ag complexes being a significant priority.
The accurate prediction of structures of Abs and Ab-Ag complexes is fundamental to enabling data-driven investigations of Abs and the development of therapeutic Abs. While structure prediction for protein monomers has seen massive success and progress has been made in predicting protein complexes, the fast and accurate prediction of Ab-Ag complexes remains a challenge. Current molecular docking methods fail to generalize, and AI-based prediction methods like AlphaFold (AF)-Multimer produce suboptimal results.
Compared to generalist prediction methods like AF-Multimer, EquiFold, Uni-Fold_MuSSe, ESMFold, OmegaFold, and HelixFold-Single, as well as Ab-specific methods like IgFold, DeepAb, and ImmuneBuilder, tFold-Ab achieves superior performance (1.6% RMSD reduction in the CDR-H3 region) and significantly higher speeds (>1000-fold) in predicting Ab structures. Similarly, when compared with AF-Multimer, Uni-Fold_MuSSe, and RoseTTAFold2, as well as conventional and AI-based docking methods like ZDock, ClusPro, HDock, EquiDock, dyMEAN, and ColabDock, tFold-Ag achieves a 37% increase in DockQ score with >10-fold faster antibody-antigen complex structure prediction than AF-Multimer.
Encouraged by the performance and speed of tFold, the authors extended its capabilities for structure-based virtual screening of binding Abs and de novo structure and sequence co-design of therapeutic Abs. Experimental results demonstrate its potential as a high-throughput tool to enhance these tasks.
References