The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.
翻译:可发现、可访问、可互操作、可重用(FAIR)数据原则为审视、评估和改善数据共享方式以促进科学发现提供了框架。将这些原则推广至科研软件及其他数字产品是当前活跃的研究领域。机器学习(ML)模型——无需显式编程即可基于数据训练的算法——以及更广义的人工智能(AI)模型,由于AI正以前所未有的速度变革实验高能物理(HEP)等科学领域,已成为该推广工作的重要目标。本文针对高能物理领域的AI模型提出了FAIR原则的实用定义,并描述了应用这些原则的模板。我们以应用于高能物理的图神经网络识别衰变为两个底夸克的希格斯玻色子的AI模型为例,展示了该模板的应用。我们报告了该FAIR AI模型的鲁棒性、跨硬件架构与软件框架的可移植性及其可解释性。