Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. It significantly surpasses existing baselines on downstream tasks, including molecule-text retrieval, molecule captioning, and more challenging open-text molecular QA tasks, especially focusing on 3D-dependent properties. We release our codes and datasets at https://github.com/lsh0520/3D-MoLM.
翻译:语言模型(LMs)已对诸多领域产生深远影响。然而,它们在理解三维分子结构方面的固有局限性,极大限制了其在生物分子领域的潜力。为弥合这一差距,我们聚焦于三维分子-文本的跨模态解读,并提出了3D-MoLM:三维分子语言建模。具体而言,3D-MoLM通过为语言模型配备一个三维分子编码器,使其能够解读和分析三维分子。这一集成通过一个三维分子-文本投影器实现,该投影器桥接了三维分子编码器的表示空间与语言模型的输入空间。此外,为增强3D-MoLM在跨模态分子理解与指令遵循方面的能力,我们精心整理了一个以三维分子为中心的指令微调数据集——3D-MoIT。通过三维分子-文本对齐与以三维分子为中心的指令微调,3D-MoLM实现了三维分子编码器与语言模型的有机整合。其在分子-文本检索、分子描述生成以及更具挑战性的开放式文本分子问答任务(尤其关注依赖于三维的性质)等下游任务上,显著超越了现有基线。我们在https://github.com/lsh0520/3D-MoLM 上发布了代码与数据集。