The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The goal is for participants to devise a single system that can generalize across different array geometries and use cases with no a-priori information. Another departure from earlier CHiME iterations is that participants are allowed to use open-source pre-trained models and datasets. In this paper, we describe the challenge design, motivation, and fundamental research questions in detail. We also present the baseline system, which is fully array-topology agnostic and features multi-channel diarization, channel selection, guided source separation and a robust ASR model that leverages self-supervised speech representations (SSLR).
翻译:CHiME挑战赛在鲁棒自动语音识别(ASR)系统的开发与评估中发挥了重要作用。我们提出第七届CHiME挑战赛中的CHiME-7远场ASR(DASR)任务,该任务涉及在配备多个(可能异构)录音设备的远场场景中进行联合ASR与说话人日志。与以往挑战不同,我们在三个多样化场景(CHiME-6、DiPCo和Mixer 6)上评估系统,旨在让参与者设计一个无需先验信息即可泛化至不同阵列几何结构及应用场景的单一系统。另一个与早期CHiME版本的区别在于,参与者可使用开源预训练模型和数据集。本文详细描述了挑战设计、动机及基础研究问题,并介绍了完全与阵列拓扑无关的基线系统,该系统具备多通道说话人日志、通道选择、引导式源分离功能,以及利用自监督语音表征(SSLR)的鲁棒ASR模型。