Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individuals
翻译:自监督语音表示已在多项语音处理任务中得到成功应用,例如作为语音质量预测的特征提取器,这进而对正常听力或听力受损用户的语音增强系统评估与训练具有重要意义。然而,关于此类表示为何以及如何有效编码质量相关信息的确切机制仍知之甚少。本研究将语音质量评分的非侵入式预测技术扩展至听力受损用户的可理解度预测领域。研究发现,自监督表示作为非侵入式预测模型的输入特征具有实用价值,其性能可与更复杂的系统相媲美。基于"清晰度预测挑战赛1"的听众与增强系统的性能分析表明,可能需要更多数据才能实现向未知系统及(听力受损)个体的泛化。