The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA

This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks. Employing a uniform training methodology, we assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties using the Simplified Molecular Input Line Entry System (SMILES) as a universal molecular representation format. Our comparative analysis involved pre-training 18 configurations of these models, with varying parameter sizes and dataset scales, followed by fine-tuning them on six benchmarking tasks from DeepChem. We maintained consistent training environments across models to ensure reliable comparisons. This approach allowed us to assess the influence of model type, size, and training dataset size on model performance. Specifically, we found that LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales. However, we observed that absolute validation loss is not a definitive indicator of model performance - contradicts previous research - at least for fine-tuning tasks: instead, model size plays a crucial role. Through rigorous replication and validation, involving multiple training and fine-tuning cycles, our study not only delineates the strengths and limitations of each model type but also provides a robust methodology for selecting the most suitable LLM for specific cheminformatics applications. This research underscores the importance of considering model architecture and dataset characteristics in deploying AI for molecular property prediction, paving the way for more informed and effective utilization of AI in drug discovery and related fields.

翻译：本研究提出一个系统化框架，用于比较大型语言模型（LLMs）在各类化学信息学任务中的微调效能。采用统一训练方法，我们评估了三种知名模型——RoBERTa、BART和LLaMA——基于简化分子线性输入规范（SMILES）作为通用分子表征格式时预测分子性质的能力。对比分析涉及对18种不同参数规模与数据集大小的模型配置进行预训练，随后在DeepChem的六个基准任务上进行微调。我们确保所有模型保持一致的训练环境以保障比较的可靠性。此方法使我们能够评估模型类型、规模及训练数据集大小对模型性能的影响。具体而言，我们发现基于LLaMA的模型通常展现出最低的验证损失，表明其在不同任务与规模下具有更优的适应性。然而，我们观察到绝对验证损失并非模型性能的确定性指标——这与先前研究相悖——至少在微调任务中如此：相反，模型规模起到关键作用。通过包含多次训练与微调循环的严谨重复验证，本研究不仅厘清了各类模型的优势与局限，还为选择特定化学信息学应用中最适合的LLM提供了可靠方法。本研究强调了在分子性质预测中部署人工智能时需考虑模型架构与数据集特征的重要性，为药物发现及相关领域更明智且高效地利用人工智能铺平道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日