Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate the training process. In this work, we propose a stable learning-based training scheme that involves a Sample Weight Learning (SWL) module, addressing distribution shift by decorrelating all selected features via learning weights from training samples. The proposed portable plug-in-like SWL is easy to apply to multiple base models and generalizes them without using extra data during training. Experiments conducted on the ASVspoof datasets clearly demonstrate the effectiveness of SWL in generalizing different models across three evaluation datasets from different distributions.
翻译:尽管当前的伪造音频检测方法在特定数据集上取得了显著成功,但在处理不同分布的数据集时往往表现不佳。以往研究通常通过使用额外数据或在训练过程中施加额外损失约束来应对分布偏移。然而,这些方法要么需要大量数据,要么使训练过程复杂化。本文提出一种基于稳定学习的训练方案,该方案包含样本权重学习(SWL)模块,通过从训练样本中学习权重来解耦所有选定特征,从而解决分布偏移问题。所提出的SWL模块类似于便携式插件,易于应用于多种基础模型,且无需在训练过程中使用额外数据即可实现模型泛化。在ASVspoof数据集上进行的实验清楚表明,SWL能有效泛化不同模型,使其在三个来自不同分布的评估数据集上均表现优异。