Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce \textbf{ChemVLM}, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.
翻译:大语言模型(LLMs)已取得显著成功,并被应用于包括化学在内的多个科学领域。然而,许多化学任务需要处理视觉信息,这是现有化学大语言模型无法成功应对的。这导致化学领域对能够整合多模态信息模型的需求日益增长。本文介绍了 \textbf{ChemVLM},一个专为化学应用设计的开源化学多模态大语言模型。ChemVLM 基于精心构建的双语多模态数据集进行训练,该数据集增强了模型对文本和视觉化学信息的理解能力,包括分子结构、化学反应及化学试题。我们开发了三个数据集用于全面评估,分别针对化学光学字符识别(OCR)、多模态化学推理(MMCR)以及多模态分子理解任务。我们在多种任务上将 ChemVLM 与一系列开源和专有多模态大语言模型进行基准测试。实验结果表明,ChemVLM 在所有评估任务中均取得了具有竞争力的性能。我们的模型可在 https://huggingface.co/AI4Chem/ChemVLM-26B 获取。