We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
翻译:我们扩展了无标签分类在异常检测中的应用,通过一个假设检验来排除仅含背景的零假设。通过检验两个判别性数据集区域的统计独立性,我们能够在无需依赖固定异常得分阈值或区域间背景估计外推的情况下,排除仅含背景的零假设。该方法依赖于异常得分特征与数据集区域的条件独立性假设,该假设可通过现有去相关技术得以保证。作为基准示例,我们考虑LHC Olympics数据集,结果表明互信息可作为统计独立性的合适检验标准,且即使在存在实际特征相关性的情况下,本方法在不同信号比例下均展现出优异且稳健的性能。