Targeted collapse regularized autoencoder for anomaly detection: black hole at the center

Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms alternatives. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it can help with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.

翻译：自编码器在近期异常检测技术的发展中被广泛应用。其应用前提基于以下概念：在正常训练数据上训练自编码器后，异常输入将表现出显著的重建误差，从而能够清晰区分正常样本与异常样本。然而在实践中观察到，自编码器可能对正常类别之外的数据产生泛化，并对某些异常样本实现较小的重建误差。为提升性能，多种技术提出了额外组件和更复杂的训练流程。本研究提出一种极其简洁的替代方案：不添加神经网络组件、复杂运算与繁琐训练，而是通过一个计算成本极低的项来补充重建损失，该正则项对潜在空间中表示的范数进行调控。该方法的简洁性最大程度降低了超参数调优和新应用定制需求，加之其宽松的数据模态约束条件，增强了在广泛应用中成功部署的潜力。我们在多种视觉和表格基准上测试该方法，证明其性能可媲美甚至超越现有方案。同时提供理论分析与数值模拟，以揭示训练过程中的潜在机制及其对异常检测的促进作用。这缓解了基于自编码器的异常检测算法的黑箱特性，为深入探究优势、失败案例及潜在新方向开辟了途径。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日