Evaluating Drug-Likeness: Current Methods and Future Directions in Predicting Potential Therapeutics
In the intricate and demanding field of drug discovery, identifying molecules with the potential to become effective therapeutics is a critical step. With millions of possible compounds to consider, researchers rely on various methods to predict "drug-likeness"—a set of properties that make a molecule a viable candidate for development. This article explores the current empirical and AI-based approaches used to evaluate drug-likeness, discusses their strengths and limitations, and looks ahead to future directions in the field.
Drug-likeness is a concept that encompasses the structural and physicochemical properties of a molecule that influence its behavior in biological systems. Predicting drug-likeness helps researchers prioritize compounds that are more likely to succeed in later stages of drug development, reducing time and costs associated with experimental testing. Effective prediction methods can streamline the drug discovery pipeline by:
Empirical methods are based on observed trends and properties of known drugs. They provide straightforward guidelines that can be applied early in the discovery process.
Introduced by Christopher Lipinski in 1997, the Rule of Five is one of the most widely recognized guidelines in medicinal chemistry. RO5 outlines key physicochemical properties associated with good oral bioavailability:
RO5 is not a strict rule but rather a general guideline. While exceptions exist, it remains a valuable starting point for many drug discovery projects. It helps identify compounds with properties conducive to absorption and permeation but does not guarantee success, nor does it account for all aspects of drug-likeness, such as metabolic stability or target specificity.
Developed by Bickerton et al., the Quantitative Estimate of Drug-likeness (QED) offers a more nuanced assessment than RO5 by providing a continuous score ranging from 0 to 1. It integrates multiple molecular properties, including:
QED is based on the properties of known drugs and may have limitations when dealing with highly novel compounds. It effectively quantifies drug-likeness for molecules similar to existing drugs but may undervalue innovative structures that fall outside established norms.
Example: A compound with a QED score close to 1 is considered highly drug-like, but a novel molecule with unique features might receive a lower score despite potential therapeutic value.
Proposed by Ertl et al., the Synthetic Accessibility (SA) score assesses the ease of synthesizing a molecule:
SA is useful for prioritizing compounds that are not only promising but also feasible to produce in the laboratory. However, it may not accurately predict synthesizability for novel compounds utilizing innovative synthetic methods.
Example: A simple molecule with common fragments would have a low SA score, indicating high synthetic accessibility.
Introduced by Lovering et al., Fsp³ measures the saturation level of a molecule:
While higher Fsp³ values have been associated with improved clinical success rates, this metric focuses on molecular complexity and does not account for other critical drug-like properties.
Example: Natural products often have high Fsp³ values due to their complex, saturated structures.
Ivanenkov et al. developed MCE-18 to quantify molecular novelty and complexity by counting specific structural features:
MCE-18 helps identify compounds with unique structural characteristics, which is important for discovering novel therapeutics and securing intellectual property. However, a high complexity score may correlate with synthetic challenges and does not necessarily indicate favorable pharmacokinetics or safety profiles.
Example: A molecule with multiple chiral centers and spiro junctions would have a high MCE-18 score, reflecting its structural complexity.
Advancements in artificial intelligence have led to the development of machine learning models that can implicitly evaluate molecular properties by learning patterns from large datasets.
Hu et al. trained autoencoders to classify drug-like molecules:
Example: The AE accurately classifies compounds similar to those in the training set but may misclassify innovative molecules.
Hooshmand et al. and Lee et al. developed models that leverage unlabeled data:
Example: These methods can suggest novel drug candidates but need careful analysis to understand the predictions.
Graph neural networks have gained popularity for their ability to model molecular structures as graphs:
Example: GNNs have been used to predict biological activity and toxicity with high accuracy.
Generative models, including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), are used to create new molecules:
Example: MolFilterGAN uses a progressively augmented GAN to triage AI-designed molecules, improving the quality of generated compounds.
Beker et al. enhanced classification models by integrating uncertainty quantification:
Example: The model achieved high accuracy in distinguishing drugs but struggled with common hydrocarbons not well-represented in the training data.
Despite the progress in drug-likeness prediction, several challenges persist, particularly in evaluating AI-designed molecules.
Example: Using pre-trained models on large, varied datasets and fine-tuning them with specific AI-designed compounds.
Example: Feature attribution methods can highlight which molecular features contribute most to a prediction.
Example: A hybrid model that filters compounds based on RO5 and then applies a GNN for detailed property prediction.
Incorporating real-world applications illustrates the practical utility and limitations of these methods.
Case Study 1: Discovery of DDR1 Kinase Inhibitors
Case Study 2: MolFilterGAN Application
Predicting drug-likeness remains a critical and complex aspect of drug discovery. While empirical methods like RO5 and QED provide valuable initial guidance, they may not suffice for evaluating highly novel compounds generated by advanced AI methods. AI-based approaches offer powerful tools to capture intricate patterns and explore vast chemical spaces but come with challenges related to generalization, interpretability, and data dependency.
Future directions involve integrating empirical knowledge with advanced AI techniques, leveraging transfer learning, and enhancing model interpretability. By combining the strengths of various methods and addressing their limitations, researchers can improve the triaging of AI-designed molecules, ultimately accelerating the discovery of new therapeutics.
References: