While self-supervised learning has improved anomaly detection in computer vision and natural language processing, it is unclear whether tabular data can benefit from it. This paper explores the limitations of self-supervision for tabular anomaly detection. We conduct several experiments spanning various pretext tasks on 26 benchmark datasets to understand why this is the case. Our results confirm representations derived from self-supervision do not improve tabular anomaly detection performance compared to using the raw representations of the data. We show this is due to neural networks introducing irrelevant features, which reduces the effectiveness of anomaly detectors. However, we demonstrate that using a subspace of the neural network's representation can recover performance.
翻译:尽管自监督学习在计算机视觉和自然语言处理领域改进了异常检测性能,但表格数据能否从中受益仍不明确。本文探讨了自监督学习在表格异常检测中的局限性。我们对26个基准数据集开展了多项涵盖不同预文本任务的实验,以理解其原因。实验结果表明,与直接使用原始数据表示相比,自监督学习所得的表示并未提升表格异常检测的性能。我们证明,这是由于神经网络引入了无关特征,降低了异常检测器的有效性。然而,我们展示了利用神经网络表示的子空间可以恢复检测性能。