High-Energy Physics experiments are facing a multi-fold data increase with every new iteration. This is certainly the case for the upcoming High-Luminosity LHC upgrade. Such increased data processing requirements forces revisions to almost every step of the data processing pipeline. One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking. A Machine Learning-assisted solution is expected to provide significant improvements, since the most time-consuming step in tracking is the assignment of hits to particles or track candidates. This is the topic of this paper. We take inspiration from large language models. As such, we consider two approaches: the prediction of the next word in a sentence (next hit point in a track), as well as the one-shot prediction of all hits within an event. In an extensive design effort, we have experimented with three models based on the Transformer architecture and one model based on the U-Net architecture, performing track association predictions for collision event hit points. In our evaluation, we consider a spectrum of simple to complex representations of the problem, eliminating designs with lower metrics early on. We report extensive results, covering both prediction accuracy (score) and computational performance. We have made use of the REDVID simulation framework, as well as reductions applied to the TrackML data set, to compose five data sets from simple to complex, for our experiments. The results highlight distinct advantages among different designs in terms of prediction accuracy and computational performance, demonstrating the efficiency of our methodology. Most importantly, the results show the viability of a one-shot encoder-classifier based Transformer solution as a practical approach for the task of tracking.
翻译:高能物理实验的数据量随着每次升级迭代呈多倍增长,即将到来的高亮度大型强子对撞机升级计划正是如此。数据处理需求的急剧增加迫使数据处理流程中几乎每个环节都需要重新审视。其中亟待革新的环节是粒子径迹重建任务(即径迹追踪)。由于径迹追踪中最耗时的步骤是将探测器命中点分配给粒子或径迹候选者,机器学习辅助解决方案有望带来显著改进。这正是本文研究的主题。我们受到大语言模型的启发,考虑两种方法:预测句子中的下一个单词(即径迹中的下一个命中点),以及一次性预测整个事件中的所有命中点。通过广泛的设计研究,我们实验了三种基于Transformer架构的模型和一种基于U-Net架构的模型,用于对撞事件命中点的径迹关联预测。在评估中,我们考虑了从简单到复杂的问题表示形式,并早期淘汰了指标较低的设计方案。我们报告了涵盖预测准确度(分数)和计算性能的广泛结果。实验中,我们利用REDVID模拟框架以及对TrackML数据集的简化处理,构建了五个从简单到复杂的数据集。结果凸显了不同设计在预测精度和计算性能方面的独特优势,证明了我们方法的有效性。最重要的是,结果表明基于单次编码-分类器的Transformer解决方案作为径迹追踪任务的实用方法是可行的。