Artificial Intelligence for Science (AI4S) is an emerging research field that utilizes machine learning advancements to tackle complex scientific computational issues, aiming to enhance computational efficiency and accuracy. However, the data-driven nature of AI4S lacks the correctness or accuracy assurances of conventional scientific computing, posing challenges when deploying AI4S models in real-world applications. To mitigate these, more comprehensive benchmarking procedures are needed to better understand AI4S models. This paper introduces a novel benchmarking approach, known as structural interpretation, which addresses two key requirements: identifying the trusted operating range in the problem space and tracing errors back to their computational components. This method partitions both the problem and metric spaces, facilitating a structural exploration of these spaces. The practical utility and effectiveness of structural interpretation are illustrated through its application to three distinct AI4S workloads: machine-learning force fields (MLFF), jet tagging, and precipitation nowcasting. The benchmarks effectively model the trusted operating range, trace errors, and reveal novel perspectives for refining the model, training process, and data sampling strategy. This work is part of the SAIBench project, an AI4S benchmarking suite.
翻译:人工智能科学(AI4S)是一个新兴研究领域,它利用机器学习进展解决复杂的科学计算问题,旨在提升计算效率与准确性。然而,AI4S数据驱动的本质缺乏传统科学计算的正确性或精度保证,这给AI4S模型在实际应用中的部署带来挑战。为缓解这些问题,需采用更全面的基准测试流程以深入理解AI4S模型。本文提出一种名为“结构性解读”的新型基准测试方法,该方法满足两个关键需求:识别问题空间中的可信运行范围,以及将误差追溯至其计算组件。该方法同时划分问题空间与度量空间,促进对这些空间的结构性探索。通过将结构性解读应用于三个不同的AI4S工作负载——机器学习力场(MLFF)、喷注标记及降水临近预报,本文展示了其实际效用与有效性。这些基准测试有效建模了可信运行范围、追溯了误差,并为优化模型、训练流程及数据采样策略揭示了新视角。本研究是AI4S基准测试套件SAIBench项目的一部分。