Rapid developments of AI tools are expected to offer unprecedented assistance to the research of natural science including chemistry. However, neither existing unimodal task-specific specialist models nor emerging general large multimodal models (LMM) can cover the wide range of chemical data modality and task categories. To address the real demands of chemists, a cross-modal Chemical General Intelligence (CGI) system, which serves as a truly practical and useful research assistant utilizing the great potential of LMMs, is in great need. In this work, we introduce the first Cross-modal Dialogue Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are generated from an initial modality by approximate calculations and task-specific model predictions. This strategy creates sufficient chemical training corpora, while significantly reducing excessive expense, resulting in an instruction-tuning dataset containing 7.6M data. After instruction finetuning, ChemDFM-X is evaluated on extensive experiments of different chemical tasks with various data modalities. The results demonstrate the capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension. ChemDFM-X marks a significant milestone toward aligning all modalities in chemistry, a step closer to CGI.
翻译:人工智能工具的快速发展有望为包括化学在内的自然科学研究提供前所未有的助力。然而,现有的单模态任务特定专家模型和新兴的通用大型多模态模型均无法覆盖化学数据模态和任务类别的广泛范围。为满足化学家的实际需求,迫切需要一种跨模态的化学通用智能系统,该系统能够利用大型多模态模型的巨大潜力,成为真正实用且有效的研究助手。本工作中,我们首次提出了跨模态化学对话基础模型。通过近似计算和任务特定模型预测,从初始模态生成多样化的多模态数据。该策略在显著降低过高成本的同时,创造了充足的化学训练语料库,从而构建了一个包含760万条数据的指令微调数据集。经过指令微调后,ChemDFM-X 在涵盖不同化学任务与多种数据模态的广泛实验上进行了评估。结果证明了 ChemDFM-X 在多模态及跨模态知识理解方面的能力。ChemDFM-X 标志着在统一化学所有模态的道路上迈出了重要一步,更接近于实现化学通用智能。