Due to the unsupervised nature of anomaly detection, the key to fueling deep models is finding supervisory signals. Different from current reconstruction-guided generative models and transformation-based contrastive models, we devise novel data-driven supervision for tabular data by introducing a characteristic -- scale -- as data labels. By representing varied sub-vectors of data instances, we define scale as the relationship between the dimensionality of original sub-vectors and that of representations. Scales serve as labels attached to transformed representations, thus offering ample labeled data for neural network training. This paper further proposes a scale learning-based anomaly detection method. Supervised by the learning objective of scale distribution alignment, our approach learns the ranking of representations converted from varied subspaces of each data instance. Through this proxy task, our approach models inherent regularities and patterns within data, which well describes data "normality". Abnormal degrees of testing instances are obtained by measuring whether they fit these learned patterns. Extensive experiments show that our approach leads to significant improvement over state-of-the-art generative/contrastive anomaly detection methods.
翻译:由于异常检测的无监督特性,为深度模型提供监督信号的关键在于挖掘有效的监督信号。不同于当前基于重构的生成模型和基于变换的对比模型,我们通过引入一种特征——尺度——作为数据标签,为表格数据设计了一种新颖的数据驱动监督方式。通过表示数据实例的不同子向量,我们将尺度定义为原始子向量维度与表示维度之间的关系。尺度作为标签附加于变换后的表示之上,从而为神经网络训练提供了充足的标注数据。本文进一步提出了一种基于尺度学习的异常检测方法。在尺度分布对齐的学习目标监督下,该方法学习从每个数据实例的不同子空间转换而来的表示的排序。通过这一代理任务,我们的方法建模了数据内部固有的规律和模式,从而很好地描述了数据的"正常性"。测试实例的异常程度通过衡量它们是否符合这些习得的模式来获得。大量实验表明,我们的方法相比最先进的生成/对比异常检测方法取得了显著改进。