Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.
翻译:选择信息量丰富的专家标注数据点,可在医学诊断或欺诈检测等多种场景下显著提升异常检测(AD)性能。本文首先确定了异常分数从标注查询数据泛化至未标注数据的理论条件集。基于这些理论结果,我们提出了一种在标签预算约束下实现最优数据覆盖的数据标注策略。此外,我们构建了用于半监督异常检测的新型学习框架。在图像、表格和视频数据集上的大量实验表明,我们的方法在标签预算约束下实现了最先进的半监督异常检测性能。