We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\sqrt{s}=14$ TeV, mean pile-up $μ= 200$). ColliderML provides one million events across ten Standard Model and Beyond Standard Model processes, plus extensive single-particle samples, all produced with modern next-to-leading order matrix element calculation and showering, realistic per-event pile-up overlay, a validated OpenDataDetector geometry, and standard reconstructions. The release fills a major gap for machine learning (ML) research on detector-level data, provided on the ML-friendly Hugging Face platform. We present physics coverage and the generation, simulation, digitisation and reconstruction pipeline, describe format and access, and initial collider physics benchmarks.
翻译:我们推出ColliderML——一个大型、开放、与实验无关的数据集,包含在高亮度大型强子对撞机条件下($\sqrt{s}=14$ TeV,平均堆积数 $μ= 200$)完全模拟和数字化的质子-质子碰撞事件。ColliderML提供了涵盖十个标准模型及超标准模型过程的一百万个事件,以及广泛的单粒子样本,所有数据均采用现代次领头阶矩阵元计算与喷注演化、真实的逐事件堆积叠加、经过验证的OpenDataDetector几何结构以及标准重建流程生成。该发布填补了机器学习在研究探测器级数据方面的重大空白,数据已发布于对机器学习友好的Hugging Face平台。我们展示了其物理覆盖范围、生成-模拟-数字化-重建的完整流程,描述了数据格式与访问方式,并提供了初步的对撞机物理基准测试结果。