Large Language Models (LLMs) can aid synthesis planning in chemistry, but standard prompting methods often yield hallucinated or outdated suggestions. We study LLM interactions with a reaction knowledge graph by casting reaction path retrieval as a Text2Cypher (natural language to graph query) generation problem, and define single- and multi-step retrieval tasks. We compare zero-shot prompting to one-shot variants using static, random, and embedding-based exemplar selection, and assess a checklist-driven validator/corrector loop. To evaluate our framework, we consider query validity and retrieval accuracy. We find that one-shot prompting with aligned exemplars consistently performs best. Our checklist-style self-correction loop mainly improves executability in zero-shot settings and offers limited additional retrieval gains once a good exemplar is present. We provide a reproducible Text2Cypher evaluation setup to facilitate further work on KG-grounded LLMs for synthesis planning. Code is available at https://github.com/Intelligent-molecular-systems/KG-LLM-Synthesis-Retrieval.
翻译:大型语言模型(LLMs)能够辅助化学合成规划,但标准提示方法常产生虚假或过时的建议。本研究通过将反应路径检索构建为Text2Cypher(自然语言到图谱查询)生成问题,探究LLMs与反应知识图谱的交互机制,并定义了单步与多步检索任务。我们比较了零样本提示与采用静态、随机及基于嵌入的示例选择策略的单样本变体,同时评估了基于检查表的验证器/校正器循环。通过查询有效性与检索准确率两个维度评估框架性能,发现采用对齐示例的单样本提示方法始终表现最佳。检查表式自校正循环主要在零样本设置中提升查询可执行性,当存在优质示例时其额外检索增益有限。本研究提供了可复现的Text2Cypher评估框架,以促进基于知识图谱的LLMs在合成规划领域的深入研究。代码发布于https://github.com/Intelligent-molecular-systems/KG-LLM-Synthesis-Retrieval。