Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms alternatives. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it can help with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.
翻译:自编码器在近期异常检测技术的发展中被广泛应用。其应用前提基于以下概念:在正常训练数据上训练自编码器后,异常输入将表现出显著的重建误差,从而能够清晰区分正常样本与异常样本。然而在实践中观察到,自编码器可能对正常类别之外的数据产生泛化,并对某些异常样本实现较小的重建误差。为提升性能,多种技术提出了额外组件和更复杂的训练流程。本研究提出一种极其简洁的替代方案:不添加神经网络组件、复杂运算与繁琐训练,而是通过一个计算成本极低的项来补充重建损失,该正则项对潜在空间中表示的范数进行调控。该方法的简洁性最大程度降低了超参数调优和新应用定制需求,加之其宽松的数据模态约束条件,增强了在广泛应用中成功部署的潜力。我们在多种视觉和表格基准上测试该方法,证明其性能可媲美甚至超越现有方案。同时提供理论分析与数值模拟,以揭示训练过程中的潜在机制及其对异常检测的促进作用。这缓解了基于自编码器的异常检测算法的黑箱特性,为深入探究优势、失败案例及潜在新方向开辟了途径。