The CHiME challenges have played a significant role in the development and evaluation of robust speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The goal is for participants to devise a single system that can generalize across different array geometries and use cases with no a-priori information. Another departure from earlier CHiME iterations is that participants are allowed to use open-source pre-trained models and datasets. In this paper, we describe the challenge design, motivation, and fundamental research questions in detail. We also present the baseline system, which is fully array-topology agnostic and features multi-channel diarization, channel selection, guided source separation and a robust ASR model that leverages self-supervised speech representations (SSLR).
翻译:CHiME系列挑战在鲁棒语音识别(ASR)系统的开发与评估中发挥了重要作用。我们提出第七届CHiME挑战中的CHiME-7远距离ASR(DASR)任务,该任务涉及使用多个(可能异构的)录音设备在远场场景下进行联合ASR与说话人日志。与以往挑战不同,我们在三个多样化场景(CHiME-6、DiPCo和Mixer 6)上评估系统。目标在于使参与者设计一个无需先验信息即可跨不同阵列几何结构与应用场景泛化的单一系统。另一项与早期CHiME迭代的差异在于,参与者允许使用开源预训练模型与数据集。本文详述了挑战设计、动机及基础研究问题,并介绍了基线系统。该系统完全独立于阵列拓扑结构,集成了多通道说话人日志、通道选择、引导源分离以及利用自监督语音表征(SSLR)的鲁棒ASR模型。