Supervised speech enhancement models are trained using artificially generated mixtures of clean speech and noise signals, which may not match real-world recording conditions at test time. This mismatch can lead to poor performance if the test domain significantly differs from the synthetic training domain. In this paper, we introduce the unsupervised domain adaptation for conversational speech enhancement (UDASE) task of the 7th CHiME challenge. This task aims to leverage real-world noisy speech recordings from the target test domain for unsupervised domain adaptation of speech enhancement models. The target test domain corresponds to the multi-speaker reverberant conversational speech recordings of the CHiME-5 dataset, for which the ground-truth clean speech reference is not available. Given a CHiME-5 recording, the task is to estimate the clean, potentially multi-speaker, reverberant speech, removing the additive background noise. We discuss the motivation for the CHiME-7 UDASE task and describe the data, the task, and the baseline system.
翻译:监督式语音增强模型使用人工生成的干净语音和噪声信号混合数据进行训练,这些数据可能在测试阶段无法匹配真实世界的录音条件。若测试域与合成训练域存在显著差异,这种不匹配可能导致模型性能低下。本文介绍了第七届CHiME挑战赛中的无监督域适应对话语音增强(UDASE)任务。该任务旨在利用目标测试域的真实含噪语音录音,实现语音增强模型的无监督域适应。目标测试域对应CHiME-5数据集中的多说话人混响对话语音录音,该类数据缺乏真实干净语音参考。给定一段CHiME-5录音,任务要求估计干净(可能包含多说话人)的混响语音,同时去除加性背景噪声。我们阐述了CHiME-7 UDASE任务的动机,并对数据、任务及基线系统进行了描述。