The workflow from particle collision to physics analysis passes through a series of reconstruction steps that are traditionally modular and disconnected, with no shared representation linking low-level detector data to high-level analysis tasks. We show that casting event reconstruction as a machine learning problem naturally produces such a shared representation. We repurpose a machine learning model trained for particle-flow reconstruction (MLPF) to perform three distinct analysis tasks: jet flavor identification, jet energy regression, and missing momentum regression. By appending the per-particle latent representations learned during reconstruction as additional input features, we substantially improve over baselines that use kinematic features alone. We further demonstrate that a single linear layer trained using only the latent representations achieves competitive performance against state-of-the-art baseline architectures, and outperforms the baseline for missing momentum regression with approximately 35 times fewer parameters. These results demonstrate that the latent representations learned during reconstruction encode essential physics information needed for downstream analysis, establishing MLPF as a foundation model and offering a concrete step toward an end-to-end pipeline from detector data to physics analysis.
翻译:从粒子碰撞到物理分析的工作流程需经过一系列传统上相互独立且不连通的重建步骤,缺乏将低层级探测器数据与高层级分析任务相连接的共享表征。我们证明,将事件重建重构为机器学习问题可自然生成此类共享表征。通过将已训练的粒子流重建机器学习模型(MLPF)迁移至三项不同分析任务:喷注味识别、喷注能量回归和缺失动量回归,我们将重建过程中为每个粒子学得的隐层表征作为附加输入特征,使仅基于运动学特征的基准模型性能获得显著提升。进一步研究表明,仅使用隐层表征训练的单个线性层可达到与最先进基准架构相媲美的性能,且在缺失动量回归任务中,该模型以约35倍更少的参数超越基准模型。这些结果证明,重建过程中学得的隐层表征编码了下游分析所需的必要物理信息,确立了MLPF作为基础模型地位,并朝着从探测器数据到物理分析的端到端流水线迈出了具体一步。