This paper proposes neural networks for compensating sensorineural hearing loss. The aim of the hearing loss compensation task is to transform a speech signal to increase speech intelligibility after further processing by a person with a hearing impairment, which is modeled by a hearing loss model. We propose an interpretable model called dynamic processing network, which has a structure similar to band-wise dynamic compressor. The network is differentiable, and therefore allows to learn its parameters to maximize speech intelligibility. More generic models based on convolutional layers were tested as well. The performance of the tested architectures was assessed using spectro-temporal objective index (STOI) with hearing-threshold noise and hearing aid speech intelligibility (HASPI) metrics. The dynamic processing network gave a significant improvement of STOI and HASPI in comparison to popular compressive gain prescription rule Camfit. A large enough convolutional network could outperform the interpretable model with the cost of larger computational load. Finally, a combination of the dynamic processing network with convolutional neural network gave the best results in terms of STOI and HASPI.
翻译:本文提出了一种用于补偿感音神经性听力损失的神经网络。听力损失补偿的任务是转换语音信号,以增强经听力受损者进一步处理后语音的可懂度,其中听力受损者的处理过程通过听力损失模型进行模拟。我们提出了一种可解释的模型——动态处理网络,其结构类似于频带动态压缩器。该网络是可微的,因此可以学习其参数以最大化语音可懂度。此外,我们还测试了基于卷积层的更通用模型。使用听阈噪声条件时的短时客观可懂度(STOI)和助听器语音可懂度(HASPI)指标评估了所测试架构的性能。与流行的压缩增益处方规则Camfit相比,动态处理网络在STOI和HASPI上取得了显著提升。足够大的卷积网络能够以更大的计算负载为代价超越该可解释模型。最后,动态处理网络与卷积神经网络的组合在STOI和HASPI指标上取得了最优结果。