We propose a modified density estimation problem that is highly effective for detecting anomalies in tabular data. Our approach assumes that the density function is relatively stable (with lower variance) around normal samples. We have verified this hypothesis empirically using a wide range of real-world data. Then, we present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples while minimizing the variance of the density around normal samples. To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution. We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results while alleviating the need for data-specific hyperparameter tuning. Finally, we have used an ablation study to demonstrate the importance of each of the proposed components, followed by a stability analysis evaluating the robustness of our model.
翻译:我们提出一种改进的密度估计问题,能够高效检测表格数据中的异常。该方法假设密度函数在正常样本附近相对稳定(方差较低)。我们通过广泛的实际数据实证验证了这一假设。随后,我们提出一个方差稳定化的密度估计问题,在最大化观测样本似然的同时,最小化正常样本周围密度的方差。为获得可靠的异常检测器,我们引入一种自回归模型谱系集成方法,用于学习方差稳定化分布。我们利用52个数据集进行了广泛基准测试,结果表明该方法在无需针对特定数据进行超参数调优的情况下实现了最先进性能。最后,我们通过消融研究验证了各提出组件的重要性,并进行了稳定性分析以评估模型的鲁棒性。