Anomalous sound detection (ASD) systems are usually compared by using threshold-independent performance measures such as AUC-ROC. However, for practical applications a decision threshold is needed to decide whether a given test sample is normal or anomalous. Estimating such a threshold is highly non-trivial in a semi-supervised setting where only normal training samples are available. In this work, F1-EV a novel threshold-independent performance measure for ASD systems that also includes the likelihood of estimating a good decision threshold is proposed and motivated using specific toy examples. In experimental evaluations, multiple performance measures are evaluated for all systems submitted to the ASD task of the DCASE Challenge 2023. It is shown that F1-EV is strongly correlated with AUC-ROC while having a significantly stronger correlation with the F1-score obtained with estimated and optimal decision thresholds than AUC-ROC.
翻译:异常声音检测(ASD)系统通常通过阈值无关的性能指标(如AUC-ROC)进行比较。然而,在实际应用中,需要确定一个决策阈值来判断给定测试样本是否正常或异常。在半监督设置下(仅可使用正常训练样本),估计此类阈值极具挑战性。本文提出了一种新颖的ASD系统阈值无关性能指标F1-EV,该指标同时包含了估计良好决策阈值的可能性,并通过特定玩具示例进行了论证。实验评估中,对所有提交至DCASE挑战2023 ASD任务的系统进行了多项性能指标评测。结果表明,F1-EV与AUC-ROC呈强相关性,且相较于AUC-ROC,其与通过估计及最优决策阈值获得的F1分数之间的相关性显著更强。