While self-supervised learning has improved anomaly detection in computer vision and natural language processing, it is unclear whether tabular data can benefit from it. This paper explores the limitations of self-supervision for tabular anomaly detection. We conduct several experiments spanning various pretext tasks on 26 benchmark datasets to understand why this is the case. Our results confirm representations derived from self-supervision do not improve tabular anomaly detection performance compared to using the raw representations of the data. We show this is due to neural networks introducing irrelevant features, which reduces the effectiveness of anomaly detectors. However, we demonstrate that using a subspace of the neural network's representation can recover performance.
翻译:尽管自监督学习已提升了计算机视觉和自然语言处理中的异常检测性能,但表格数据能否从中获益尚不明确。本文探讨了自监督学习在表格异常检测中的局限性。我们通过在26个基准数据集上进行涵盖多种预训练任务的实验,来探究这一现象的原因。研究结果证实,与使用数据的原始表示相比,自监督学习得到的表示并未提升表格异常检测的性能。我们证明,这是由于神经网络引入了无关特征,从而降低了异常检测器的有效性。然而,我们展示了利用神经网络表示的子空间可以恢复检测性能。