科学实验自主性基准测试：面向大型科学装置自主化运行的分级分类体系 (Benchmarking Autonomy in Scientific Experiments: A Hierarchical Taxonomy for Autonomous Large-Scale Facilities)

The transition from automated data collection to fully autonomous discovery requires a shared vocabulary to benchmark progress. While the automotive industry relies on the SAE J3016 standard, current taxonomies for autonomous science presuppose an owner-operator model that is incompatible with the operational rigidities of Large-Scale User Facilities. Here, we propose the Benchmarking Autonomy in Scientific Experiments (BASE) Scale, a 6-level taxonomy (Levels 0-5) specifically adapted for these unique constraints. Unlike owner-operator models, User Facilities require zero-shot deployment where agents must operate immediately without extensive training periods. We define the specific technical requirements for each tier, identifying the Inference Barrier (Level 3) as the critical latency threshold where decisions shift from scalar feedback to semantic digital twins. Fundamentally, this level extends the decision manifold from spatial exploration to temporal gating, enabling the agent to synchronise acquisition with the onset of transient physical events. By establishing these operational definitions, the BASE Scale provides facility directors, funding bodies, and beamline scientists with a standardised metric to assess risk, define liability, and quantify the intelligence of experimental workflows.

翻译：从自动化数据采集向完全自主科学发现的转型需要建立统一的术语体系以衡量进展。尽管汽车行业依赖SAE J3016标准，当前科学自主化分类体系仍预设了所有者-运营者模式，这与大型用户装置固有的运行刚性不相容。本文提出科学实验自主性基准测试（BASE）分级体系——专为此类特殊约束条件设计的六级分类标准（0-5级）。与所有者-运营者模式不同，用户装置需要零样本部署能力，即智能体必须在未经长期训练的情况下立即投入运行。我们明确定义了每个层级的具体技术要求，并将推理屏障（第3级）确定为关键延迟阈值——该层级决策机制从标量反馈转向语义数字孪生。本质上，该层级将决策流形从空间探索扩展到时序门控，使智能体能够将数据采集与瞬态物理事件的发生实现同步。通过建立这些操作性定义，BASE分级体系为装置负责人、资助机构和光束线科学家提供了标准化度量工具，用以评估风险、界定责任并量化实验工作流的智能水平。