This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a simulated noisy speech corpus and does not take advantage of the CPC2 data. For that reason, the intelligibility prediction systems are robust to unseen scenarios given the accurate prediction performance on the CPC2 evaluation.
翻译:本文描述了从预训练鲁棒噪声自动语音识别(ASR)模型中导出的两种可懂度预测系统,用于第二届清晰度预测挑战赛(CPC2)。一种系统为侵入式系统,利用ASR模型的隐藏表示进行预测。另一种系统为非侵入式系统,通过导出的ASR不确定性进行预测。该ASR模型仅使用模拟噪声语音语料库进行预训练,并未利用CPC2数据。因此,鉴于在CPC2评估中展现的准确预测性能,该可懂度预测系统对未见场景具有鲁棒性。