We study the use of large language models (LLMs) for physics instrument design and compare their performance to reinforcement learning (RL). Using only prompting, LLMs are given task constraints and summaries of prior high-scoring designs and propose complete detector configurations, which we evaluate with the same simulators and reward functions used in RL-based optimization. Although RL yields stronger final designs, we find that modern LLMs consistently generate valid, resource-aware, and physically meaningful configurations that draw on broad pretrained knowledge of detector design principles and particle--matter interactions, despite having no task-specific training. Based on this result, as a first step toward hybrid design workflows, we explore pairing the LLMs with a dedicated trust region optimizer, serving as a precursor to future pipelines in which LLMs propose and structure design hypotheses while RL performs reward-driven optimization. Based on these experiments, we argue that LLMs are well suited as meta-planners: they can design and orchestrate RL-based optimization studies, define search strategies, and coordinate multiple interacting components within a unified workflow. In doing so, they point toward automated, closed-loop instrument design in which much of the human effort required to structure and supervise optimization can be reduced.
翻译:本研究探讨大语言模型(LLMs)在物理仪器设计中的应用,并将其性能与强化学习(RL)进行比较。仅通过提示工程,我们向LLMs提供任务约束条件及先前高分设计的摘要,由其提出完整的探测器配置方案,并使用与RL优化相同的模拟器和奖励函数进行评估。尽管RL能产生更优的最终设计,但我们发现现代LLMs无需任务特定训练,即可基于预训练阶段获取的探测器设计原理与粒子-物质相互作用知识,持续生成有效、资源敏感且物理意义明确的配置方案。基于此发现,作为迈向混合设计工作流程的第一步,我们探索将LLMs与专用置信域优化器结合,为未来构建LLMs提出设计假设框架、RL执行奖励驱动优化的联合管线奠定基础。实验表明,LLMs非常适合作为元规划器:它们能够设计并协调基于RL的优化研究,定义搜索策略,并在统一工作流中协调多个相互作用组件。这为实现自动化闭环仪器设计指明方向,有望大幅减少人工构建与监督优化流程所需的工作量。