Accurately predicting protein melting temperature changes (Delta Tm) is fundamental for assessing protein stability and guiding protein engineering. Leveraging multi-modal protein representations has shown great promise in capturing the complex relationships among protein sequences, structures, and functions. In this study, we develop models based on powerful protein language models, including ESM-2, ESM-3, SaProt, and AlphaFold, using various feature extraction methods to enhance prediction accuracy. By utilizing the ESM-3 model, we achieve a new state-of-the-art performance on the s571 test dataset, obtaining a Pearson correlation coefficient (PCC) of 0.50. Furthermore, we conduct a fair evaluation to compare the performance of different protein language models in the Delta Tm prediction task. Our results demonstrate that integrating multi-modal protein representations could advance the prediction of protein melting temperatures.
翻译:准确预测蛋白质熔解温度变化(ΔTm)对于评估蛋白质稳定性和指导蛋白质工程至关重要。利用多模态蛋白质表征在捕捉蛋白质序列、结构与功能间的复杂关系方面展现出巨大潜力。本研究基于ESM-2、ESM-3、SaProt和AlphaFold等强大的蛋白质语言模型,采用多种特征提取方法构建预测模型以提升准确度。通过应用ESM-3模型,我们在s571测试数据集上取得了当前最优性能,皮尔逊相关系数(PCC)达到0.50。此外,我们进行了公平评估以比较不同蛋白质语言模型在ΔTm预测任务中的表现。结果表明,整合多模态蛋白质表征能够推进蛋白质熔解温度的预测研究。