Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information sharing. Moreover, to train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.
翻译:分子关系学习(MRL)旨在理解分子对之间的相互作用,在推动生化研究发展中发挥着关键作用。近年来,利用具有庞大知识库和先进逻辑推理能力的大型语言模型(LLM)进行高效且有效的MRL已成为一种有前景的方法。尽管潜力巨大,但这些方法主要依赖文本数据,未能充分利用分子图中蕴含的丰富结构信息。此外,统一框架的缺失加剧了信息利用不足的问题,因为它阻碍了跨不同数据集学习到的交互机制的共享。为应对这些挑战,本文提出一种基于思维链(CoT)理论的新型LLM多模态分子交互预测框架,命名为MolTC,其能有效整合分子对中两个分子的图结构信息。为实现统一的MRL,MolTC创新性地开发了一种动态参数共享策略,用于跨数据集的信息共享。同时,为高效训练MolTC,我们引入多层级思维链(Multi-hierarchical CoT)概念以优化训练范式,并构建了包含MRL任务的综合分子交互指令数据集,用于开发涉及MRL的生化领域LLM。我们在涉及超过400万分子对的多个数据集上进行的实验表明,我们的方法优于当前基于GNN和LLM的基线方法。代码开源地址:https://github.com/MangoKiller/MolTC。