Blog | Genophore

FAbCon: Can antibody-specific generative foundation model facilitate antibody sequence understanding?

Written by Genophore | May 27, 2024 6:38:00 PM

FAbCon: Can antibody-specific generative foundation model facilitate antibody sequence understanding?

A team led by Jacob Galson has just released FAbCon, a set of generative large language models (LLMs) for antibody binding prediction and design. The FAbCon LLM is a specialized model that comprises 2.4 billion parameters based on Falcon, an LLM for natural language processing applications. The collection of human antibodies forms a uniquely diverse and large class of proteins that caters for the wide range of antigens that the human body would be exposed to. 

The capability to decode the language of human antibodies would enable deeper understanding of diseases and the consequent ability to treat them in a targeted manner. LLMs offers the ideal capacity to analyze and encode the patterns in the large antibody sequence corpus, with the possibility to scale performance with model size.

FAbCon is a decoder-only antibody-specific LLM comprising more than 2 billion parameters built on Falcon, an LLM for natural language processing. FAbCon also shares some similarity to other state-of-the-art generative protein language models like ProGen2 and ProtGPT2.

 

was pre-trained on paired (heavy and light chains in a single sequence) and unpaired (heavy or light chain only) antibody sequences, following which the model exhibited a mature representation of antibody sequences that enabled its utility for diverse antibody modeling and design tasks.

FAbCon consists of three models of varying number of parameters:

1️⃣ FAbCon-small (144 million parameters)

2️⃣ FAbCon-medium (297 million parameters)

3️⃣ FAbConlarge (2.4 billion parameters)

The models have been specifically evaluated for antigen binding prediction and antibody design capabilities. All the models can handle heavy-chain only, light-chain only, and paired chain inputs.

💉 The FAbCon models were fine-tuned for binding predictions on datasets of three antigens: HER2, SARS-CoV2 spike peptide, and IL-6. FAbCon-large demonstrated the best performance based on AUPR and AUROC metrics, outperforming existing models.

🏃‍♂️ On generative de novo antibody design task, FAbCon-large surpasses ProGen2-OAS based on OASis score, a measure of humanness of generated antibodies.

❓ How does FAbCon’s generative capability for designing new antibodies compare with traditional biologics discovery methods in terms of speed, cost, and success rate in identifying viable manufacturable therapeutic candidates?

❔ What are the specific challenges that need to be addressed to integrate this model effectively into the current drug discovery pipeline?

Resources

Paper: A generative foundation model for antibody sequence understanding

Models (huggingface): https://huggingface.co/alchemab