Supervised speech enhancement models are trained using artificially generated mixtures of clean speech and noise signals, which may not match real-world recording conditions at test time. This mismatch can lead to poor performance if the test domain significantly differs from the synthetic training domain. This paper introduces the unsupervised domain adaptation for conversational speech enhancement (UDASE) task of the 7th CHiME challenge. This task aims to leverage real-world noisy speech recordings from the target domain for unsupervised domain adaptation of speech enhancement models. The target domain corresponds to the multi-speaker reverberant conversational speech recordings of the CHiME-5 dataset, for which the ground-truth clean speech reference is unavailable. Given a CHiME-5 recording, the task is to estimate the clean, potentially multi-speaker, reverberant speech, removing the additive background noise. We discuss the motivation for the CHiME-7 UDASE task and describe the data, the task, and the baseline system.
翻译:监督式语音增强模型通过使用人工生成的干净语音与噪声信号的混合数据训练,这些数据可能无法匹配测试时的真实录音条件。若测试域与合成训练域存在显著差异,该不匹配问题将导致模型性能下降。本文介绍了第七届CHiME挑战赛中的无监督域适应对话语音增强(UDASE)任务。该任务旨在利用目标域的真实噪声语音录音实现语音增强模型的无监督域适应。目标域对应CHiME-5数据集的多说话人混响对话语音录音,其真实干净语音参考不可得。给定CHiME-5录音,任务目标为去除加性背景噪声,估计出干净的、可能包含多说话人且带有混响的语音。我们论述了CHiME-7 UDASE任务的动机,并描述了数据、任务及基线系统。