Jet tagging at the Large Hadron Collider increasingly relies on deep learning models trained on massive simulated datasets, leading to high computational costs and limited robustness to detector mismodeling. We introduce JetParticle-JEPA (JP-JEPA), a self-supervised Joint-Embedding Predictive Architecture that learns physically meaningful jet representations directly from continuous particle clouds without tokenization or reconstruction of raw inputs. Built on a Particle Transformer backbone, JP-JEPA predicts latent representations of masked particles while preserving fine-grained kinematic correlations. On the JetClass benchmark, JP-JEPA achieves performance comparable to fully supervised state-of-the-art methods on the full dataset, surpasses supervised baselines in low-label regimes, and significantly outperforms existing SSL approaches. On Top Quark and Quark-Gluon Tagging benchmarks, it remains on par with supervised methods. The learned representations also exhibit strong robustness to missing detector information and improved uncertainty behavior, highlighting JP-JEPA as a promising foundation-model framework for robust and data-efficient jet physics at the LHC.
翻译:大型强子对撞机的喷注标记日益依赖基于海量模拟数据集训练的深度学习模型,这导致计算成本高昂且对探测器建模误差的鲁棒性有限。我们提出JetParticle-JEPA(JP-JEPA)——一种自监督联合嵌入预测架构,可直接从连续粒子云中学习具有物理意义的喷注重表示,无需对原始输入进行分词或重建。该方法以粒子Transformer为骨干网络,在预测被掩蔽粒子的潜在表示的同时保留细粒度的运动学关联。在JetClass基准测试中,JP-JEPA在完整数据集上取得了与全监督最先进方法相当的性能,在低标注数据场景中超越监督基线方法,并显著优于现有自监督学习方法。在顶夸克标记和夸克-胶子标记基准测试中,该方法仍与监督方法保持同等水平。学习到的表示还展现出对探测器信息缺失的强鲁棒性以及改进的不确定性行为,凸显JP-JEPA作为高效且鲁棒的LHC喷注物理基础模型框架的巨大潜力。