Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper's embeddings to create cross-domain acoustic features and includes metadata from speech signals by using a classifier that distinguishes different enhancement methods. Furthermore, MBI-Net+ integrates the hearing-aid speech perception index (HASPI) as a supplementary metric into the objective function to further boost prediction performance. Experimental results demonstrate that MBI-Net+ surpasses several intrusive baseline systems and MBI-Net on the Clarity Prediction Challenge 2023 dataset, validating the effectiveness of incorporating Whisper embeddings, speech metadata, and related complementary metrics to improve prediction performance for HA.
翻译:自动语音清晰度评估对于助听器(HA)开发至关重要。本文提出三种新颖方法以提升清晰度预测精度,并介绍MBI-Net+——即首届清晰度预测挑战赛中表现最佳系统MBI-Net的增强版本。MBI-Net+利用Whisper嵌入生成跨域声学特征,并通过区分不同增强方法的分类器引入语音信号的元数据。此外,MBI-Net+将助听器语音感知指数(HASPI)作为补充指标集成至目标函数中,以进一步提升预测性能。实验结果表明,在2023年清晰度预测挑战赛数据集上,MBI-Net+优于多种侵入式基线系统及MBI-Net,验证了融合Whisper嵌入、语音元数据及相关互补指标对提升助听器预测性能的有效性。