The recent success of large foundation models in artificial intelligence has prompted the emergence of chemical pre-trained models. Despite the growing interest in large molecular pre-trained models that provide informative representations for downstream tasks, attempts for multimodal pre-training approaches on the molecule domain were limited. To address this, we present a novel multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties, drawing inspiration from recent advances in multimodal learning techniques. Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space, which enables the model to regard bidirectional information between the molecules' structure and properties. These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model. Through extensive experiments, we demonstrate that our model shows remarkable capabilities in solving various meaningful chemical challenges, including conditional molecule generation, property prediction, molecule classification, and reaction prediction.
翻译:人工智能领域大型基础模型的近期成功推动了化学预训练模型的出现。尽管为下游任务提供信息表征的大型分子预训练模型受到日益关注,但分子领域的多模态预训练方法尝试仍然有限。为解决这一问题,我们提出了一种新型多模态分子预训练模型,该模型融合了结构与生化性质模态,借鉴了多模态学习技术的最新进展。我们提出的数据处理与训练目标模型流程将结构/性质特征对齐至共同嵌入空间,使模型能够感知分子结构与性质之间的双向信息。这些贡献催生了协同知识,使我们能够通过单一模型处理多模态与单模态下游任务。通过广泛实验,我们证明该模型在解决各类重要化学挑战(包括条件分子生成、性质预测、分子分类和反应预测)中展现出卓越能力。