Classroom environments are particularly challenging for children with hearing impairments, where background noise, multiple talkers, and reverberation degrade speech perception. These difficulties are greater for children than adults, yet most deep learning speech separation models for assistive devices are developed using adult voices in simplified, low-reverberation conditions. This overlooks both the higher spectral similarity of children's voices, which weakens separation cues, and the acoustic complexity of real classrooms. We address this gap using MIMO-TasNet, a compact, low-latency, multi-channel architecture suited for real-time deployment in bilateral hearing aids or cochlear implants. We simulated naturalistic classroom scenes with moving child-child and child-adult talker pairs under varying noise and distance conditions. Training strategies tested how well the model adapts to children's speech through spatial cues. Models trained on adult speech, classroom data, and finetuned variants were compared to assess data-efficient adaptation. Results show that adult-trained models perform well in clean scenes, but classroom-specific training greatly improves separation quality. Finetuning with only half the classroom data achieved comparable gains, confirming efficient transfer learning. Training with diffuse babble noise further enhanced robustness, and the model preserved spatial awareness while generalizing to unseen distances. These findings demonstrate that spatially aware architectures combined with targeted adaptation can improve speech accessibility for children in noisy classrooms, supporting future on-device assistive technologies.
翻译:课堂环境对听力受损儿童尤为具有挑战性,背景噪声、多说话者干扰及混响效应会显著降低语音感知能力。儿童面临的困难较成人更为突出,然而当前大多数用于辅助设备的深度学习语音分离模型均基于成人语音在简化、低混响条件下开发。这忽略了儿童语音频谱相似度更高(导致分离线索减弱)以及真实教室声学环境复杂性两方面问题。本研究采用MIMO-TasNet——一种适用于双耳助听器或人工耳蜗实时部署的紧凑型低延迟多通道架构——来填补这一空白。我们通过模拟移动的儿童-儿童及儿童-成人对话对在不同噪声与距离条件下的自然教室场景,测试了模型通过空间线索适应儿童语音的能力。通过比较基于成人语音训练、教室数据训练及微调变体的模型,评估了数据高效适应策略。结果表明:成人语音训练的模型在纯净场景表现良好,但教室特异性训练能大幅提升分离质量;仅使用半数教室数据进行微调即可获得相当增益,证实了高效迁移学习的可行性;采用扩散性嘈杂噪声训练进一步增强了模型鲁棒性,且模型在泛化至未见距离时仍保持空间感知能力。这些发现证明,空间感知架构与针对性适应策略相结合,能够提升儿童在嘈杂教室中的语音可及性,为未来设备端辅助技术的发展提供支持。