Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regime-without any information about anomalous instances in the training data-is challenging. A recently observed phenomenon, known as the inlier-memorization (IM) effect, where deep generative models (DGMs) tend to memorize inlier patterns during early training, provides a promising signal for distinguishing outliers. However, existing unsupervised approaches that rely solely on the IM effect still struggle when inliers and outliers are not well-separated or when outliers form dense clusters. To address these limitations, we incorporate active learning to selectively acquire informative labels, and propose IMBoost, a novel framework that explicitly reinforces the IM effect to improve outlier detection. Our method consists of two stages: 1) a warm-up phase that induces and promotes the IM effect, and 2) a polarization phase in which actively queried samples are used to maximize the discrepancy between inlier and outlier scores. In particular, we propose a novel query strategy and tailored loss function in the polarization phase to effectively identify informative samples and fully leverage the limited labeling budget. We provide a theoretical analysis showing that the IMBoost consistently decreases inlier risk while increasing outlier risk throughout training, thereby amplifying their separation. Extensive experiments on diverse benchmark datasets demonstrate that IMBoost not only significantly outperforms state-of-the-art active OD methods but also requires substantially less computational cost.
翻译:离群点检测旨在通过学习正常数据(即内点)的典型模式来识别异常实例(称为离群点或异常值)。在无监督机制下执行离群点检测——即训练数据中不含任何异常实例信息——具有挑战性。近期观察到的内点记忆效应现象(即深度生成模型在训练早期倾向于记忆内点模式),为区分离群点提供了有前景的信号。然而,现有仅依赖内点记忆效应的无监督方法在内点与离群点分离不充分或离群点形成密集簇时仍存在困难。为克服这些局限,我们引入主动学习以选择性获取信息性标签,并提出IMBoost这一新颖框架,通过显式增强内点记忆效应来改进离群点检测。该方法包含两个阶段:1)诱导并强化内点记忆效应的预热阶段;2)利用主动查询样本最大化内点与离群点评分差异的极化阶段。特别地,我们在极化阶段提出了创新的查询策略与定制化损失函数,以有效识别信息性样本并充分利用有限标注预算。理论分析表明,IMBoost在训练过程中持续降低内点风险同时提升离群点风险,从而放大其分离度。在多样化基准数据集上的大量实验证明,IMBoost不仅显著优于当前最先进的主动离群点检测方法,且计算成本大幅降低。