The cochlear implant (CI) is a successful biomedical device that enables individuals with severe-to-profound hearing loss to perceive sound through electrical stimulation, yet listening in noise remains challenging. Recent deep learning advances offer promising potential for CI sound coding by integrating visual cues. In this study, an audio-visual speech enhancement (AVSE) module is integrated with the ElectrodeNet-CS (ECS) model to form the end-to-end CI system, AVSE-ECS. Simulations show that the AVSE-ECS system with joint training achieves high objective speech intelligibility and improves the signal-to-error ratio (SER) by 7.4666 dB compared to the advanced combination encoder (ACE) strategy. These findings underscore the potential of AVSE-based CI sound coding.
翻译:人工耳蜗(CI)是一种成功的生物医学设备,能使重度至极重度听力损失患者通过电刺激感知声音,但在噪声环境中聆听仍具挑战性。近期深度学习进展通过整合视觉线索,为CI声音编码提供了前景广阔的可能性。本研究将视听语音增强(AVSE)模块与ElectrodeNet-CS(ECS)模型集成,构建了端到端CI系统AVSE-ECS。仿真结果表明,采用联合训练的AVSE-ECS系统实现了较高的客观语音清晰度,相较于先进的组合编码器(ACE)策略,其信号误差比(SER)提升了7.4666 dB。这些发现凸显了基于AVSE的CI声音编码技术的潜力。