Deep generative models with tractable and analytically computable likelihoods, exemplified by normalizing flows, offer an effective basis for anomaly detection through likelihood-based scoring. We demonstrate that, unlike in the image domain where deep generative models frequently assign higher likelihoods to anomalous data, such counterintuitive behavior occurs far less often in tabular settings. We first introduce a domain-agnostic formulation that enables consistent detection and evaluation of the counterintuitive phenomenon, addressing the absence of precise definition. Through extensive experiments on 47 tabular datasets and 10 CV/NLP embedding datasets in ADBench, benchmarked against 13 baseline models, we demonstrate that the phenomenon, as defined, is consistently rare in general tabular data. We further investigate this phenomenon from both theoretical and empirical perspectives, focusing on the roles of data dimensionality and difference in feature correlation. Our results suggest that likelihood-only detection with normalizing flows offers a practical and reliable approach for anomaly detection in tabular domains.
翻译:具有可处理且可解析计算似然度的深度生成模型(以标准化流为代表)为基于似然评分的异常检测提供了有效基础。我们证明,与图像领域中深度生成模型常为异常数据分配更高似然度的现象不同,此类反直觉行为在表格数据场景中极少出现。我们首先提出一种领域无关的数学表述,以解决该现象缺乏精确定义的问题,从而实现对其一致性的检测与评估。通过在ADBench的47个表格数据集和10个计算机视觉/自然语言处理嵌入数据集上进行广泛实验,并以13个基线模型为基准,我们证明所定义的现象在通用表格数据中持续保持罕见性。我们进一步从理论和实证角度探究该现象,重点关注数据维度和特征相关性差异所起的作用。研究结果表明,仅基于标准化流似然度的检测方法为表格领域的异常检测提供了一种实用且可靠的途径。