This paper introduces HAAQI-Net, a non-intrusive deep learning model for music quality assessment tailored to hearing aid users. In contrast to traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It takes an assessed music sample and a hearing loss pattern as input, generating a predicted HAAQI score. The model employs the pre-trained Bidirectional Encoder representation from Audio Transformers (BEATs) for acoustic feature extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a Longitudinal Concordance Correlation (LCC) of 0.9257, Spearman's Rank Correlation Coefficient (SRCC) of 0.9394, and Mean Squared Error (MSE) of 0.0080. Notably, this high performance comes with a substantial reduction in inference time: from 62.52 seconds (by HAAQI) to 2.71 seconds (by HAAQI-Net), serving as an efficient music quality assessment model for hearing aid users.
翻译:本文提出HAAQI-Net,一种专为助听器用户设计的非侵入式深度学习音乐质量评估模型。与助听器音频质量指数(HAAQI)等传统方法不同,HAAQI-Net采用带注意力机制的双向长短期记忆网络(BLSTM)。该模型以待评估音乐样本和听力损失模式为输入,输出预测的HAAQI评分,并利用预训练的音频变换器双向编码器表示(BEATs)进行声学特征提取。通过将预测评分与真实值对比,HAAQI-Net在纵向一致性相关系数(LCC)上达到0.9257,斯皮尔曼秩相关系数(SRCC)达0.9394,均方误差(MSE)为0.0080。值得注意的是,这一高性能伴随着推理时间的大幅缩减:从HAAQI的62.52秒降至HAAQI-Net的2.71秒,使其成为面向助听器用户的高效音乐质量评估模型。