Jet tagging is a critical yet challenging classification task in particle physics. While deep learning has transformed jet tagging and significantly improved performance, the lack of a large-scale public dataset impedes further enhancement. In this work, we present JetClass, a new comprehensive dataset for jet tagging. The JetClass dataset consists of 100 M jets, about two orders of magnitude larger than existing public datasets. A total of 10 types of jets are simulated, including several types unexplored for tagging so far. Based on the large dataset, we propose a new Transformer-based architecture for jet tagging, called Particle Transformer (ParT). By incorporating pairwise particle interactions in the attention mechanism, ParT achieves higher tagging performance than a plain Transformer and surpasses the previous state-of-the-art, ParticleNet, by a large margin. The pre-trained ParT models, once fine-tuned, also substantially enhance the performance on two widely adopted jet tagging benchmarks. The dataset, code and models are publicly available at https://github.com/jet-universe/particle_transformer.
翻译:喷注标记是粒子物理学中一项关键且具有挑战性的分类任务。虽然深度学习已彻底改变喷注标记并显著提升性能,但缺乏大规模公共数据集阻碍了进一步改进。本文提出JetClass——一个全新的综合性喷注标记数据集。该数据集包含1亿个喷注,规模比现有公共数据集高出约两个数量级。我们模拟了10种类型的喷注,包括数种此前尚未探索过的标记类型。基于这一大规模数据集,我们提出了一种新的基于Transformer架构的喷注标记方法,称为粒子Transformer(ParT)。通过在注意力机制中引入成对粒子相互作用,ParT实现了优于普通Transformer的标记性能,并大幅超越此前最先进的ParticleNet方法。预训练后的ParT模型经微调后,在两个广泛采用的喷注标记基准测试中亦显著提升性能。数据集、代码及模型已在https://github.com/jet-universe/particle_transformer 开源。