Existing contrastive learning methods for anomalous sound detection refine the audio representation of each audio sample by using the contrast between the samples' augmentations (e.g., with time or frequency masking). However, they might be biased by the augmented data, due to the lack of physical properties of machine sound, thereby limiting the detection performance. This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample. The proposed two-stage method uses contrastive learning to pretrain the audio representation model by incorporating machine ID and a self-supervised ID classifier to fine-tune the learnt model, while enhancing the relation between audio features from the same ID. Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification in overall anomaly detection performance and stability on DCASE 2020 Challenge Task2 dataset.
翻译:现有用于异常声音检测的对比学习方法,通过利用样本增强(例如时间或频率掩蔽)之间的对比来优化每个音频样本的音频表示。然而,由于缺乏机器声音的物理特性,这些方法可能受到增强数据的偏差影响,从而限制了检测性能。本文采用对比学习,针对每个机器ID而非每个音频样本优化音频表示。所提出的两阶段方法通过结合机器ID进行对比学习预训练音频表示模型,并利用自监督ID分类器对学习到的模型进行微调,同时增强来自同一ID的音频特征之间的关联性。实验表明,在DCASE 2020挑战赛任务2数据集上,本方法在整体异常检测性能和稳定性上超越了使用对比学习或自监督分类的最先进方法。