Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery.
翻译:通过分子语言模型(MoLM)理解分子及其文本描述,近来引起了研究者的广泛关注。然而,MoLM领域存在独特的挑战,原因在于:1)分子-文本配对数据量有限;2)由于专家关注领域高度专业化,导致知识体系存在缺失。为此,我们提出AMOLE模型,该模型:1)通过结构相似性保持损失来增强分子-文本配对数据;2)在分子间迁移专家知识。在多种下游任务上的大量实验表明,AMOLE在理解分子及其描述方面具有优越性,凸显了其在现实世界药物发现中的应用潜力。