The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral features and the listener's audiogram as inputs, and we investigate four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction along with amplification capabilities for improved performance in real-world scenarios. To enhance generalization, a comprehensive data augmentation strategy was employed during training on diverse speech (TIMIT and TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) demonstrates that the Transformer architecture within NeuroAMP achieves the best performance, with SRCC scores of 0.9927 (HASQI) and 0.9905 (HASPI) on TIMIT, and 0.9738 (HAAQI) on the Cadenza Challenge MUSIC dataset. Notably, our data augmentation strategy maintains high performance on unseen datasets (e.g., VCTK, MUSDB18-HQ). Furthermore, Denoising NeuroAMP outperforms both the conventional NAL-R+WDRC approach and a two-stage baseline on the VoiceBank+DEMAND dataset, achieving a 10% improvement in both HASPI (0.90) and HASQI (0.59) scores. These results highlight the potential of NeuroAMP and Denoising NeuroAMP to deliver notable improvements in personalized hearing aid amplification.
翻译:助听器的普及率正在不断提高。然而,由于传统方法中集成多个模块化组件的复杂性,优化助听器的放大过程仍然具有挑战性。为应对这一挑战,我们提出了NeuroAMP,一种专为助听器端到端个性化放大而设计的新型深度神经网络。NeuroAMP利用频谱特征和听者的听力图作为输入,并研究了四种架构:卷积神经网络(CNN)、长短期记忆网络(LSTM)、卷积循环神经网络(CRNN)以及Transformer。我们还引入了Denoising NeuroAMP,这是一种集成了降噪与放大功能的扩展版本,旨在提升真实场景下的性能。为增强泛化能力,在多样化的语音(TIMIT和TMHINT)和音乐(Cadenza Challenge MUSIC)数据集上进行训练时,采用了全面的数据增强策略。使用助听器语音感知指数(HASPI)、助听器语音质量指数(HASQI)和助听器音频质量指数(HAAQI)进行的评估表明,NeuroAMP中的Transformer架构实现了最佳性能,在TIMIT数据集上获得的SRCC分数分别为0.9927(HASQI)和0.9905(HASPI),在Cadenza Challenge MUSIC数据集上为0.9738(HAAQI)。值得注意的是,我们的数据增强策略在未见数据集(如VCTK、MUSDB18-HQ)上保持了高性能。此外,Denoising NeuroAMP在VoiceBank+DEMAND数据集上的表现优于传统的NAL-R+WDRC方法和一个两阶段基线,在HASPI(0.90)和HASQI(0.59)分数上均实现了10%的提升。这些结果凸显了NeuroAMP和Denoising NeuroAMP在提供个性化助听器放大方面带来显著改进的潜力。