This paper introduces a speech enhancement solution tailored for true wireless stereo (TWS) earbuds on-device usage. The solution was specifically designed to support conversations in noisy environments, with active noise cancellation (ANC) activated. The primary challenges for speech enhancement models in this context arise from computational complexity that limits on-device usage and latency that must be less than 3 ms to preserve a live conversation. To address these issues, we evaluated several crucial design elements, including the network architecture and domain, design of loss functions, pruning method, and hardware-specific optimization. Consequently, we demonstrated substantial improvements in speech enhancement quality compared with that in baseline models, while simultaneously reducing the computational complexity and algorithmic latency.
翻译:本文介绍了一种专为真无线立体声(TWS)耳机设备端使用而设计的语音增强解决方案。该方案特别针对在开启主动降噪(ANC)的嘈杂环境中进行对话的场景。在此背景下,语音增强模型面临的主要挑战源于计算复杂度(限制了设备端部署)以及必须低于3毫秒以保持实时对话的延迟要求。为解决这些问题,我们评估了若干关键设计要素,包括网络架构与处理域、损失函数设计、剪枝方法以及针对特定硬件的优化。最终,我们在显著降低计算复杂度和算法延迟的同时,相比基线模型实现了语音增强质量的实质性提升。