Blaabjerg et al. reported RaSP, a deep learning (DL) method that accurately and rapidly predicts protein stability changes. Understanding and manipulation of the stability of proteins is valuable for engineering, investigation of disease mechanisms, and development of therapeutic molecules.
Estimation of stability changes using experimental methods is time-consuming and prohibitive in terms of cost. In silico methods offer a viable alternative to experiments, and methods with improved accuracy are highly sought after.
RaSP integrates pre-trained DL representations and self-supervised learning. RaSP exhibited similar accuracy as physics-based approaches but its faster in terms of execution speed. It offers stability prediction for site-saturation mutagenesis at a blazing speed of 0.4 s/residue, irrespective of protein size.
The trained model reproduces Rosetta ΔΔG values, achieving a Pearson correlation coefficient of 0.82 and a mean absolute error (MAE) of 0.73 kcal/mol on unseen test data set comprising 10 proteins with complete saturation mutagenesis.
When benchmarked on experimental stability measurements, RaSP predictions showed Pearson correlation coefficient of 0.79-0.57. These values indicated that the accuracy of RaSP is on-par with ΔΔG predictions of Rosetta, despite being orders of magnitude faster. Higher accuracy is bottlenecked with the limitations of existing experimental data for training, such as experiment-to-experiment variations.
When further evaluated on the S669 experimental data set consisting of 669 protein variants and 94 structures, RaSP performs equally with Rosetta and comparable to several other best-performing modern methods. In addition, RaSP also demonstrated similar performance to Rosetta when using computationally modeled structures as input.
The prediction bias of RaSP is comparable to that of Rosetta. However, RaSP is up to 1000 times faster than Rosetta. The speed of RaSP enabled its large-scale application in the estimation of about 230 million stability changes for almost all single residue substitutions in all known human proteins.
The proteome-wide stability predictions facilitate the investigation of protein variants that have been observed in human populations. Commonly observed variants in human population were found to occur significantly less in the predictions due to very low stability. Further, clear differentiations were found between benign and pathogenic variants, with benign variants showing smaller effects on protein stability, while many disease-causing mutations were found to be destabilizing. This underscores the relevance of protein stability in disease-causing missense mutations.
Resources
Paper: Rapid protein stability prediction using deep learning representations
Code: Rapid protein stability prediction using deep learning representations