Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies

Unsupervised anomaly detection is a daunting task, as it relies solely on normality patterns from the training data to identify unseen anomalies during testing. Recent approaches have focused on leveraging domain-specific transformations or perturbations to generate synthetic anomalies from normal samples. The objective here is to acquire insights into normality patterns by learning to differentiate between normal samples and these crafted anomalies. However, these approaches often encounter limitations when domain-specific transformations are not well-specified such as in tabular data, or when it becomes trivial to distinguish between them. To address these issues, we introduce a novel domain-agnostic method that employs a set of conditional perturbators and a discriminator. The perturbators are trained to generate input-dependent perturbations, which are subsequently utilized to construct synthetic anomalies, and the discriminator is trained to distinguish normal samples from them. We ensure that the generated anomalies are both diverse and hard to distinguish through two key strategies: i) directing perturbations to be orthogonal to each other and ii) constraining perturbations to remain in proximity to normal samples. Throughout experiments on real-world datasets, we demonstrate the superiority of our method over state-of-the-art benchmarks, which is evident not only in image data but also in tabular data, where domain-specific transformation is not readily accessible. Additionally, we empirically confirm the adaptability of our method to semi-supervised settings, demonstrating its capacity to incorporate supervised signals to enhance anomaly detection performance even further.

翻译：无监督异常检测是一项艰巨的任务，因为它仅依赖于训练数据中的正常模式来识别测试期间未见过的异常。近期方法侧重于利用领域特定的变换或扰动从正常样本中生成合成异常。其目标是通过学习区分正常样本与这些人工构造的异常，从而深入理解正常模式。然而，当领域特定变换未能明确定义（例如在表格数据中），或当区分它们变得过于简单时，这些方法往往存在局限性。为解决这些问题，我们提出了一种新颖的领域无关方法，该方法采用一组条件扰动器和一个判别器。扰动器被训练以生成输入相关的扰动，随后利用这些扰动构建合成异常；判别器则被训练以区分正常样本与合成异常。我们通过两个关键策略确保生成的异常既多样化又难以区分：i) 引导扰动彼此正交；ii) 约束扰动保持在正常样本附近。通过在真实世界数据集上的实验，我们证明了该方法相对于最先进基准方法的优越性，这不仅在图像数据中表现明显，在领域特定变换不易获取的表格数据中同样显著。此外，我们通过实证验证了该方法对半监督设置的适应性，展示了其能够整合监督信号以进一步提升异常检测性能。