Keyword spotting accuracy degrades when neural networks are exposed to noisy environments. On-site adaptation to previously unseen noise is crucial to recovering accuracy loss, and on-device learning is required to ensure that the adaptation process happens entirely on the edge device. In this work, we propose a fully on-device domain adaptation system achieving up to 14% accuracy gains over already-robust keyword spotting models. We enable on-device learning with less than 10 kB of memory, using only 100 labeled utterances to recover 5% accuracy after adapting to the complex speech noise. We demonstrate that domain adaptation can be achieved on ultra-low-power microcontrollers with as little as 806 mJ in only 14 s on always-on, battery-operated devices.
翻译:当神经网络暴露于噪声环境时,关键词检测精度会下降。针对未知噪声进行现场自适应是恢复精度损失的关键,而设备端学习则要求自适应过程完全在边缘设备上完成。本文提出一种全设备端域自适应系统,相较于已有的鲁棒关键词检测模型,准确率提升高达14%。我们仅需不到10 kB内存即可实现设备端学习,使用100个带标签话语在适应复杂语音噪声后即可恢复5%的准确率。实验证明,在始终在线、电池供电的设备上,超低功耗微控制器仅需14秒、消耗806 mJ能量即可完成域自适应。