Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information sharing. Moreover, to train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.
翻译:分子关系学习(MRL)旨在理解分子对之间的相互作用,在推动生物化学研究中发挥着关键作用。近年来,采用以庞大知识库和高级逻辑推理能力著称的大型语言模型(LLMs),已成为实现高效且有效的MRL的一条有前景的途径。尽管具有潜力,但这些方法主要依赖文本数据,因而未能充分利用分子图中固有的丰富结构信息。此外,统一框架的缺失加剧了信息利用不足的问题,因为它阻碍了在不同数据集上学到的相互作用机制的共享。为应对这些挑战,本工作提出了一种新颖的、基于LLM的多模态框架用于分子相互作用预测,遵循思维链(CoT)理论,称为MolTC,该框架有效整合了配对中两个分子的图形信息。为达成统一的MRL,MolTC创新性地开发了一种动态参数共享策略,用于跨数据集信息共享。此外,为高效训练MolTC,我们引入了多层级CoT概念以优化其训练范式,并构建了一个全面的分子交互指令数据集,用于涉及MRL的生物化学LLM的开发。我们在涵盖超过4,000,000个分子对的各种数据集上进行的实验,展示了我们的方法相对于当前基于GNN和LLM基线的优越性。代码可在 https://github.com/MangoKiller/MolTC 获取。