Anomaly detection aims at detecting unexpected behaviours in the data. Because anomaly detection is usually an unsupervised task, traditional anomaly detectors learn a decision boundary by employing heuristics based on intuitions, which are hard to verify in practice. This introduces some uncertainty, especially close to the decision boundary, that may reduce the user trust in the detector's predictions. A way to combat this is by allowing the detector to reject examples with high uncertainty (Learning to Reject). This requires employing a confidence metric that captures the distance to the decision boundary and setting a rejection threshold to reject low-confidence predictions. However, selecting a proper metric and setting the rejection threshold without labels are challenging tasks. In this paper, we solve these challenges by setting a constant rejection threshold on the stability metric computed by ExCeeD. Our insight relies on a theoretical analysis of such a metric. Moreover, setting a constant threshold results in strong guarantees: we estimate the test rejection rate, and derive a theoretical upper bound for both the rejection rate and the expected prediction cost. Experimentally, we show that our method outperforms some metric-based methods.
翻译:异常检测旨在识别数据中的异常行为。由于异常检测通常是一项无监督任务,传统异常检测器通过基于直觉的启发式方法学习决策边界,这类直觉在实践中难以验证。这引入了不确定性,尤其是在接近决策边界时,可能降低用户对检测器预测结果的信任。解决这一问题的一种方法是允许检测器拒绝具有高不确定性的样本(即“学习拒绝机制”)。这需要采用能够捕捉与决策边界距离的置信度指标,并设定拒绝阈值以拒绝对低置信度样本的预测。然而,在无标签条件下选择合适的指标并设定拒绝阈值具有挑战性。本文通过基于ExCeeD计算的稳定性指标设定恒定拒绝阈值来应对这些挑战。我们的洞察源于对该指标的理论分析。此外,设定恒定阈值可带来强理论保证:我们估算了测试拒绝率,并推导出拒绝率与期望预测成本的理论上界。实验表明,我们的方法优于某些基于指标的方法。