Designing molecules with target properties is most useful when candidate structures are accompanied by feasible synthetic routes. We introduce My Chemical Harness, a route-native evolutionary framework for goal-directed molecular design in which the search population consists of executable synthetic pathways rather than isolated molecular graphs. Each route is built from purchasable building blocks and reaction templates, executed by deterministic chemistry tools, and scored through task-specific molecular oracles. Large language models (LLMs) are used only as strategy controllers that select high-level preferences over route length, move type, reaction families, motifs, and exploration pressure, while local code performs route construction, validation, deduplication, scoring, selection, and memory updates. This separation lets the LLM guide exploration without allowing it to introduce hallucinated products or unsupported reaction steps. On a soluble epoxide hydrolase proxy task, our LLM agent improves over single pass LLM and deterministic controllers, reaching state-of-the-art performance across the sEH score, synthetic accessibility score, and AiZynthFinder success rate metrics. These results suggest that constrained LLM agents can play a significant role in molecular discovery without requiring training, fine-tuning, or dedicated generative models.
翻译:当候选结构伴随可行合成路线时,以目标性质设计分子最为有用。我们提出"我的化学 harness"(My Chemical Harness),一种面向目标分子设计的路径原生进化框架,其搜索种群由可执行的合成路径而非孤立分子图构成。每条路径由可购买的构建模块和反应模板构建,通过确定性化学工具执行,并通过任务特异性分子预测器进行评分。大型语言模型(LLMs)仅作为策略控制器使用,用于选择关于路径长度、移动类型、反应家族、结构基序及探索压力的高层次偏好,而局部代码则执行路径构建、验证、去重、评分、选择及记忆更新。这种分离使LLM能够指导探索,同时避免其引入幻觉产物或不可支持的反应步骤。在可溶性环氧水解酶代理任务上,我们的LLM智能体优于单次LLM和确定性控制器,在sEH分数、合成可及性分数及AiZynthFinder成功率指标上均达到最优性能。这些结果表明,受约束的LLM智能体无需训练、微调或专用生成模型,即可在分子发现中发挥重要作用。