In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD). GeCo exploits a Predictive AutoEncoder (PAE) equipped with self-attention as a generative model to perform frame-level prediction. The output of the PAE together with original normal samples, are used for supervised contrastive representative learning in a multi-task framework. Besides cross-entropy loss between classes, contrastive loss is used to separate PAE output and original samples within each class. GeCo aims to better capture context information among frames, thanks to the self-attention mechanism for PAE model. Furthermore, GeCo combines generative and contrastive learning from which we aim to yield more effective and informative representations, compared to existing methods. Extensive experiments have been conducted on the DCASE2020 Task2 development dataset, showing that GeCo outperforms state-of-the-art generative and discriminative methods.
翻译:本文提出了一种联合生成与对比表示学习方法(GeCo),用于异常声音检测(ASD)。GeCo采用配备自注意力机制的预测自编码器(PAE)作为生成模型,执行帧级预测。PAE的输出与原始正常样本一起,在多任务框架中用于监督对比表示学习。除类别间的交叉熵损失外,对比损失用于分离PAE输出与每个类别中的原始样本。得益于PAE模型的自注意力机制,GeCo能够更好地捕获帧间的上下文信息。此外,与现有方法相比,GeCo结合了生成式学习与对比学习,旨在产生更有效且信息量更大的表示。在DCASE2020 Task2开发数据集上进行了大量实验,结果表明GeCo优于当前最先进的生成式与判别式方法。