Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect -- linking molecular concepts to specific structural components -- remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Furthermore, we constructed the largest molecular understanding benchmark to date, comprising 79k QA pairs, and developed a multi-agent grounding prototype as proof of concept. This system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks such as molecular captioning and ATC (Anatomical, Therapeutic, Chemical) classification.
翻译:当前分子理解方法主要关注人类感知的描述性层面,提供宽泛的主题级见解。然而,指代性层面——将分子概念与特定结构成分相关联——在很大程度上尚未得到探索。为填补这一空白,我们提出了一个分子指代基准测试,旨在评估模型的指代能力。我们将分子指代与自然语言处理、化学信息学和分子科学领域的既定规范对齐,展示了在“AI for Science”运动中利用自然语言处理技术推进分子理解的潜力。此外,我们构建了迄今为止最大的分子理解基准数据集,包含79k个问答对,并开发了多智能体指代原型作为概念验证。该系统在性能上超越了包括GPT-4o在内的现有模型,其指代输出已被整合用于增强分子描述和ATC(解剖学、治疗学、化学)分类等传统任务。