Conformal calibration and look-elsewhere effect in anomaly detection for new-physics searches

Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46σ$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10σ$ excesses and $\approx5σ$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.

翻译：机器学习驱动的异常检测正在重塑新颖物理的搜索范式，但其统计解释能力已滞后于检测性能。原始异常得分缺乏校准意义，扫描多区域的模型会放大“多看效应”，而该领域依赖的渐近显著性无法识别异常检测器特别容易受到的背景建模错误。我们提出一种基于保形预测的校准层，可将任意异常得分转化为具有分布无关、有限样本保证的可辩护显著性。保形预测将得分转换为有效的局部p值，加权变体和Mondrian变体修复了共振搜索中边带-信号区域可交换性失效问题，而Gross-Vitells步骤将结果转化为考虑多看效应的全局显著性。该校准层同时实现双重功能：揭示标准流程无法察觉的校准偏差，并在无需重新训练检测器的情况下予以修正。在公开的LHC奥林匹克数据上，分类器表现出子结构-质量相关性，导致基于边带校准的背景p值变得过于激进。若直接采用该p值，单纯的背景塑造就会制造出约$46σ$的超出信号，而无标签加权修正可消除此偏差，恢复真实的零假设。当作为盲法宽质量区间凸起搜索运行时，即使在没有信号的窗口中，标准渐近方法和无权重方法也能产生$\gtrsim10σ$和$\approx5σ$的超出信号，而保形层未引发任何虚警，其全局误报率在纯背景伪实验中得到验证。该方案提供了一条从非校准得分到考虑试验因子显著性的可审计、检测器无关路径，可直接集成到实验异常搜索中。