TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma Models

Cécile Rousseau,Samuel Jackson,Rodrigo H. Ordonez-Hurtado,Nicola C. Amorisco,Tobia Boschi,George K. Holt,Andrea Loreti,Eszter Székely,Alexander Whittle,Adriano Agnello,Stanislas Pamela,Alessandra Pascale,Robert Akers,Juan Bernabe Moreno,Sue Thorne,Mykhaylo Zayats

Development and operation of commercially viable fusion energy reactors such as tokamaks require accurate predictions of plasma dynamics from sparse, noisy, and incomplete sensors readings. The complexity of the underlying physics and the heterogeneity of experimental data pose formidable challenges for conventional numerical methods, while simultaneously highlight the promise of modern data-native AI approaches. A major obstacle in realizing this potential is, however, the lack of curated, openly available datasets and standardized benchmarks. Existing fusion datasets are scarce, fragmented across institutions, facility-specific, and inconsistently annotated, which limits reproducibility and prevents a fair and scalable comparison of AI approaches. In this paper, we introduce TokaMark, a structured benchmark to evaluate AI models on real experimental data collected from the Mega Ampere Spherical Tokamak (MAST). TokaMark provides a comprehensive suite of tools designed to (i) unify access to multi-modal heterogeneous fusion data, and (ii) harmonize formats, metadata, temporal alignment and evaluation protocols to enable consistent cross-model and cross-task comparisons. The benchmark includes a curated list of 14 tasks spanning a range of physical mechanisms, exploiting a variety of diagnostics and covering multiple operational use cases. A baseline model is provided to facilitate transparent comparison and validation within a unified framework. By establishing a unified benchmark for both the fusion and AI-for-science communities, TokaMark aims to accelerate progress in data-driven AI-based plasma modeling, contributing to the broader goal of achieving sustainable and stable fusion energy. The benchmark, documentation, and tooling will be fully open sourced upon acceptance to encourage community adoption and contribution.

翻译：托卡马克等商业化可行聚变能反应堆的研发与运行，需要依据稀疏、含噪且不完整的传感器读数对等离子体动力学进行精确预测。基础物理过程的复杂性及实验数据的异质性，对传统数值方法构成了严峻挑战，同时也凸显了现代数据原生人工智能方法的潜力。然而，实现这一潜力的主要障碍在于缺乏经过系统整理、公开可用的数据集与标准化基准。现有聚变数据集稀缺、分散于不同机构、局限于特定装置且标注方式不一致，这限制了研究的可复现性，并阻碍了对人工智能方法进行公平且可扩展的比较。本文提出TokaMark，这是一个用于评估基于兆安球形托卡马克（MAST）真实实验数据的人工智能模型的结构化基准。TokaMark提供了一套综合工具集，旨在（i）统一访问多模态异质聚变数据，以及（ii）协调数据格式、元数据、时间对齐与评估协议，以实现一致的跨模型与跨任务比较。该基准包含14项经过筛选的任务，涵盖一系列物理机制，利用多种诊断手段，并覆盖多个操作应用场景。基准提供了一个基线模型，以便在统一框架内进行透明的比较与验证。通过为聚变科学与人工智能科学社区建立统一基准，TokaMark旨在加速数据驱动的人工智能等离子体建模研究进展，从而为实现可持续、稳定聚变能的更广泛目标做出贡献。该基准、相关文档及工具将在论文录用后完全开源，以促进社区采用与贡献。