Rapid developments of AI tools are expected to offer unprecedented assistance to the research of natural science including chemistry. However, neither existing unimodal task-specific specialist models nor emerging general large multimodal models (LMM) can cover the wide range of chemical data modality and task categories. To address the real demands of chemists, a cross-modal Chemical General Intelligence (CGI) system, which serves as a truly practical and useful research assistant utilizing the great potential of LMMs, is in great need. In this work, we introduce the first Cross-modal Dialogue Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are generated from an initial modality by approximate calculations and task-specific model predictions. This strategy creates sufficient chemical training corpora, while significantly reducing excessive expense, resulting in an instruction-tuning dataset containing 7.6M data. After instruction finetuning, ChemDFM-X is evaluated on extensive experiments of different chemical tasks with various data modalities. The results demonstrate the capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension. ChemDFM-X marks a significant milestone toward aligning all modalities in chemistry, a step closer to CGI.
翻译:人工智能工具的快速发展有望为包括化学在内的自然科学研究提供前所未有的助力。然而,现有的单模态任务专用模型和新兴的通用大型多模态模型均无法覆盖广泛的化学数据模态和任务类别。为满足化学家的实际需求,迫切需要一种跨模态的化学通用智能系统,该系统能够利用大型多模态模型的巨大潜力,成为真正实用且有效的研究助手。本工作中,我们提出了首个跨模态化学对话基础模型。通过近似计算和任务特定模型预测,从初始模态生成多样化的多模态数据。该策略在显著降低过高成本的同时,创建了充足的化学训练语料,最终构建了一个包含760万条数据的指令微调数据集。经过指令微调后,ChemDFM-X在涵盖不同数据模态的多种化学任务上进行了广泛实验评估。结果证明了ChemDFM-X在多模态及跨模态知识理解方面的能力。ChemDFM-X标志着化学全模态对齐的重要里程碑,向化学通用智能更近了一步。