Recently, Large Language Models (LLMs) have demonstrated great potential in natural language-driven molecule discovery. However, existing datasets and benchmarks for molecule-text alignment are predominantly built on one-to-one mappings, measuring LLMs' ability to retrieve a single, pre-defined answer, rather than their creative potential to generate diverse, yet equally valid, molecular candidates. To address this critical gap, we propose Speak-to-Structure (S^2-Bench), the first benchmark to evaluate LLMs in open-domain natural language-driven molecule generation. S^2-Bench is specifically designed for one-to-many relationships, challenging LLMs to exhibit genuine molecular understanding and open-ended generation capabilities. Our benchmark includes three key tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom), each probing a different aspect of molecule discovery. We also introduce OpenMolIns, a large-scale instruction tuning dataset that enables Llama3.1-8B to surpass the most powerful LLMs like GPT-4o and Claude-3.5 on S^2-Bench. Our comprehensive evaluation of 31 LLMs shifts the focus from simple pattern recall to realistic molecular design, paving the way for more capable LLMs in natural language-driven molecule discovery. Our codes and datasets are fully accessible through the Github Repository: https://github.com/phenixace/S2-TOMG-Bench and Huggingface Datasets: https://huggingface.co/datasets/phenixace/S2-TOMG-Bench.
翻译:近期,大语言模型(LLMs)在自然语言驱动的分子发现任务中展现出巨大潜力。然而,现有用于分子-文本对齐的数据集与基准测试主要基于一一映射关系,仅评测LLMs检索单一预定义答案的能力,而非其生成多样化且同等有效的候选分子的创造性潜能。为填补这一关键空白,我们提出Speak-to-Structure(S^2-Bench)——首个用于评估LLMs在开放域自然语言驱动分子生成中的基准测试。S^2-Bench专为"一对多"关系设计,要求LLMs展现真正的分子理解能力与开放式生成能力。该基准包含三项核心任务:分子编辑(MolEdit)、分子优化(MolOpt)和定制化分子生成(MolCustom),分别从不同维度探索分子发现过程。我们还引入了大规模指令微调数据集OpenMolIns,使Llama3.1-8B在S^2-Bench上超越GPT-4o、Claude-3.5等最强LLMs。我们对31个LLMs的系统性评估将研究焦点从简单模式记忆转向真实分子设计场景,为开发更强大的自然语言驱动分子发现LLMs铺平道路。代码与数据集已通过Github仓库(https://github.com/phenixace/S2-TOMG-Bench)及Huggingface数据集(https://huggingface.co/datasets/phenixace/S2-TOMG-Bench)完全开放获取。