Statistical models in high-energy physics formally encode the relationship between observed data, physics parameters of interest, and experimental and theoretical uncertainties. Likelihood-based inference is the central tool for precision measurements, effective field theory fits, and cross-analysis combinations. Consequently, there is an increasing need for machine-readable, descriptive, and portable model representations. Existing formats such as ROOT workspaces, pyhf JSON, and CMS DataCards provide valuable capabilities but remain tied to specific software stacks and offer no universal standard for exchange, validation, or long-term preservation. We introduce HS3, the High-Energy Physics Statistics Serialization Standard, an implementation-agnostic, human-readable, and extensible serialization format for statistical models. HS3 is designed such that new statistical constructs can be incorporated through backward-compatible extensions, while inference procedures and implementation-specific execution details remain the responsibility of downstream frameworks. HS3 represents likelihoods as computational graphs composed of named distributions, functions, datasets, domains, and analysis prescriptions. It supports binned and unbinned likelihoods as well as hierarchical composite models. HS3 is convertible from and to ROOT/RooFit and is a superset of pyhf. We describe the design principles, structure, and semantics of HS3 and summarize existing implementations in C++, Python, and Julia. We also present early applications to public likelihoods on HEPData, cross-framework validation, and reproducibility efforts. HS3 provides a foundation for FAIR (Findable, Accessible, Interoperable, Reusable), long-lived statistical models at the LHC and beyond. The standard is intended to serve the broader scientific community and to evolve over time for application across a wide range of domains.
翻译:高能物理中的统计模型形式化地编码了观测数据、感兴趣的物理参数以及实验和理论不确定性之间的关系。基于似然函数的推断是精确测量、有效场论拟合和跨分析组合的核心工具。因此,对机器可读、可描述且可移植的模型表示的需求日益增长。现有格式(如ROOT工作区、pyhf JSON和CMS DataCards)提供了有价值的功能,但仍与特定软件栈绑定,且缺乏用于交换、验证或长期保存的通用标准。我们提出HS3——高能物理统计序列化标准,这是一种与实现无关、人类可读且可扩展的统计模型序列化格式。HS3的设计允许通过向后兼容的扩展来纳入新的统计结构,而推断过程与具体实现的执行细节则由下游框架负责。HS3将似然函数表示为由命名分布、函数、数据集、域和分析规则组成的计算图,支持分箱与未分箱似然以及层次化复合模型。HS3可与ROOT/RooFit互转,且是pyhf的超集。我们阐述了HS3的设计原则、结构与语义,并总结了其在C++、Python和Julia中的现有实现。我们还介绍了其在HEPData公共似然数据集、跨框架验证及可重复性研究中的早期应用。HS3为LHC及未来实验中符合FAIR原则(可查找、可访问、可互操作、可重用)的长期统计模型奠定了基础。该标准旨在服务更广泛的科学界,并将随时间演进以适用于多个领域。