This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI) that require intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. By utilizing a Bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features extracted from the pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. Experimental results demonstrate HAAQI-Net's effectiveness, achieving a Linear Correlation Coefficient (LCC) of 0.9368 , a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486 , and a Mean Squared Error (MSE) of 0.0064 and inference time significantly reduces from 62.52 to 2.54 seconds. To address computational overhead, a knowledge distillation strategy was applied, reducing parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071 , SRCC: 0.9307 , MSE: 0.0091 ). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores like the Mean Opinion Score (MOS) through fine-tuning. This adaptation significantly improved prediction accuracy, validated through statistical analysis. Furthermore, the robustness of HAAQI-Net was evaluated under varying Sound Pressure Level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a scalable solution for music audio quality assessment in hearing aid applications, contributing to efficient and accurate models in audio signal processing and hearing aid technology.
翻译:本文介绍了HAAQI-Net,一种面向助听器用户的、基于深度学习的非侵入式音乐音频质量评估模型。与需要侵入式参考信号比较的传统方法(如助听器音频质量指数HAAQI)不同,HAAQI-Net提供了一种更易获取且计算效率更高的替代方案。该模型采用具有注意力机制的双向长短期记忆网络架构,并结合从预训练BEATs模型中提取的特征,能够直接从音乐音频片段和听力损失模式预测HAAQI分数。实验结果表明HAAQI-Net的有效性,其线性相关系数达到0.9368,斯皮尔曼等级相关系数达到0.9486,均方误差为0.0064,且推理时间从62.52秒显著减少至2.54秒。为降低计算开销,应用了知识蒸馏策略,在保持强劲性能(LCC: 0.9071, SRCC: 0.9307, MSE: 0.0091)的同时,将参数量减少了75.85%,推理时间降低了96.46%。为扩展其功能,通过微调使HAAQI-Net能够预测主观人类评分,如平均意见得分。经统计分析验证,此调整显著提升了预测准确性。此外,评估了HAAQI-Net在不同声压级条件下的鲁棒性,结果显示在参考SPL为65 dB时性能最优,随着SPL偏离该值,准确性逐渐下降。在主观评分预测、SPL鲁棒性和计算效率方面的这些进展,使HAAQI-Net成为助听器应用中音乐音频质量评估的可扩展解决方案,为音频信号处理和助听器技术领域高效、准确的模型发展做出了贡献。