We propose Gumbel Noise Score Matching (GNSM), a novel unsupervised method to detect anomalies in categorical data. GNSM accomplishes this by estimating the scores, i.e. the gradients of log likelihoods w.r.t.~inputs, of continuously relaxed categorical distributions. We test our method on a suite of anomaly detection tabular datasets. GNSM achieves a consistently high performance across all experiments. We further demonstrate the flexibility of GNSM by applying it to image data where the model is tasked to detect poor segmentation predictions. Images ranked anomalous by GNSM show clear segmentation failures, with the outputs of GNSM strongly correlating with segmentation metrics computed on ground-truth. We outline the score matching training objective utilized by GNSM and provide an open-source implementation of our work.
翻译:我们提出Gumbel噪声分数匹配(GNSM),一种用于分类数据异常检测的无监督方法。GNSM通过估计连续松弛化分类分布的分数(即对数似然对输入的梯度)来实现该目标。我们在多个异常检测表格数据集上测试该方法,GNSM在所有实验中均保持稳定且优异的性能。我们进一步将GNSM应用于图像数据以展示其灵活性,该模型被要求检测不准确的分割预测。GNSM判定为异常的图像展现出明显的分割失败区域,且GNSM输出结果与基于真值计算的分割指标高度相关。我们概述了GNSM采用的分数匹配训练目标,并提供了本工作的开源实现。