Unsupervised anomalous sound detection (ASD) aims to identify anomalous sounds by learning the features of normal operational sounds and sensing their deviations. Recent approaches have focused on the self-supervised task utilizing the classification of normal data, and advanced models have shown that securing representation space for anomalous data is important through representation learning yielding compact intra-class and well-separated intra-class distributions. However, we show that conventional approaches often fail to ensure sufficient intra-class compactness and exhibit angular disparity between samples and their corresponding centers. In this paper, we propose a training technique aimed at ensuring intra-class compactness and increasing the angle gap between normal and abnormal samples. Furthermore, we present an architecture that extracts features for important temporal regions, enabling the model to learn which time frames should be emphasized or suppressed. Experimental results demonstrate that the proposed method achieves the best performance giving 0.90%, 0.83%, and 2.16% improvement in terms of AUC, pAUC, and mAUC, respectively, compared to the state-of-the-art method on DCASE 2020 Challenge Task2 dataset.
翻译:无监督异常声音检测旨在通过学习正常操作声音的特征并感知其偏差来识别异常声音。近期研究聚焦于利用正常数据分类的自监督任务,先进模型通过表征学习获得紧凑的类内分布与良好分离的类间分布,证明了为异常数据保留表征空间的重要性。然而,我们发现传统方法往往难以确保充分的类内紧凑性,并在样本与其对应中心之间存在角度偏差。本文提出一种训练策略,旨在保障类内紧凑性并增大正常样本与异常样本之间的角度间隔。此外,我们设计了一种提取重要时间区域特征的架构,使模型能够学习哪些时间帧应被强调或抑制。实验结果表明,在DCASE 2020挑战赛任务2数据集上,相较于当前最优方法,所提方法在AUC、pAUC和mAUC指标上分别实现了0.90%、0.83%和2.16%的性能提升。