This paper introduces HAAQI-Net, a non-intrusive deep learning model for music quality assessment tailored to hearing aid users. In contrast to traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It takes an assessed music sample and a hearing loss pattern as input, generating a predicted HAAQI score. The model employs the pre-trained Bidirectional Encoder representation from Audio Transformers (BEATs) for acoustic feature extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a Longitudinal Concordance Correlation (LCC) of 0.9368, Spearman's Rank Correlation Coefficient (SRCC) of 0.9486, and Mean Squared Error (MSE) of 0.0064. Notably, this high performance comes with a substantial reduction in inference time: from 62.52 seconds (by HAAQI) to 2.54 seconds (by HAAQI-Net), serving as an efficient music quality assessment model for hearing aid users.
翻译:本文提出了HAAQI-Net,一种针对助听器用户设计的非侵入式深度学习音乐质量评估模型。与传统方法(如助听器音频质量指数HAAQI)相比,HAAQI-Net采用带注意力机制的双向长短期记忆网络。该模型以待评估音乐样本和听力损失模式作为输入,生成预测的HAAQI分数。模型使用预训练的音频变换器双向编码器表示进行声学特征提取。将预测分数与真实值对比,HAAQI-Net实现了纵向一致性相关系数为0.9368,斯皮尔曼秩相关系数为0.9486,均方误差为0.0064。值得注意的是,这一高性能伴随着推理时间的大幅缩减:从HAAQI的62.52秒降至HAAQI-Net的2.54秒,为助听器用户提供了一种高效的音乐质量评估模型。