In High Energy Physics, as in many other fields of science, the application of machine learning techniques has been crucial in advancing our understanding of fundamental phenomena. Increasingly, deep learning models are applied to analyze both simulated and experimental data. In most experiments, a rigorous regime of testing for physically motivated systematic uncertainties is in place. The numerical evaluation of these tests for differences between the data on the one side and simulations on the other side quantifies the effect of potential sources of mismodelling on the machine learning output. In addition, thorough comparisons of marginal distributions and (linear) feature correlations between data and simulation in "control regions" are applied. However, the guidance by physical motivation, and the need to constrain comparisons to specific regions, does not guarantee that all possible sources of deviations have been accounted for. We therefore propose a new adversarial attack - the CONSERVAttack - designed to exploit the remaining space of hypothetical deviations between simulation and data after the above mentioned tests. The resulting adversarial perturbations are consistent within the uncertainty bounds - evading standard validation checks - while successfully fooling the underlying model. We further propose strategies to mitigate such vulnerabilities and argue that robustness to adversarial effects must be considered when interpreting results from deep learning in particle physics.
翻译:在高能物理领域,与许多其他科学领域一样,机器学习技术的应用在推动我们对基本现象的理解中发挥了关键作用。深度学习模型正越来越多地被用于分析模拟数据和实验数据。在大多数实验中,已建立了一套严格的基于物理动机的系统不确定性检验机制。通过数值评估这些检验中数据与模拟之间的差异,可以量化潜在错误建模来源对机器学习输出的影响。此外,在"控制区域"中对数据与模拟之间的边际分布和(线性)特征相关性进行深入比较。然而,物理动机的指导作用以及将比较限制在特定区域的需求,并不能保证所有可能的偏差来源都已被考虑。因此,我们提出了一种新的对抗攻击——CONSERVAttack——旨在利用上述测试后模拟与数据之间剩余的假设性偏差空间。由此产生的对抗性扰动在不确定性界限内保持一致,从而规避标准验证检查,同时成功欺骗底层模型。我们进一步提出了缓解此类漏洞的策略,并主张在粒子物理学中解释深度学习结果时,必须考虑对对抗效应的鲁棒性。