Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. It significantly surpasses existing baselines on downstream tasks, including molecule-text retrieval, molecule captioning, and more challenging open-text molecular QA tasks, especially focusing on 3D-dependent properties.
翻译:语言模型(LMs)已深刻影响多个领域,然而其在理解三维分子结构方面的固有局限性严重制约了其在生物分子领域的应用潜力。为弥合这一差距,我们聚焦三维分子-文本理解任务,并提出3D-MoLM:三维分子语言建模框架。具体而言,3D-MoLM通过为语言模型配备三维分子编码器,使其能够解读与分析三维分子。该集成通过三维分子-文本投影器实现,该投影器桥接了三维分子编码器的表示空间与语言模型的输入空间。此外,为增强3D-MoLM的跨模态分子理解与指令遵循能力,我们精心构建了面向三维分子的指令微调数据集——3D-MoIT。通过三维分子-文本对齐与三维分子中心化指令微调,3D-MoLM实现了三维分子编码器与语言模型的深度整合。在分子-文本检索、分子描述生成及更具挑战性的开放文本分子问答等下游任务中,该模型显著超越现有基准方法,尤其专注于三维依赖性性质的建模。