Modern approaches for keyword spotting rely on training deep neural networks on large static datasets with i.i.d. distributions. However, the resulting models tend to underperform when presented with changing data regimes in real-life applications. This work investigates a simple but effective online continual learning method that updates a keyword spotter on-device via SGD as new data becomes available. Contrary to previous research, this work focuses on learning the same KWS task, which covers most commercial applications. During experiments with dynamic audio streams in different scenarios, that method improves the performance of a pre-trained small-footprint model by 34%. Moreover, experiments demonstrate that, compared to a naive online learning implementation, conditional model updates based on its performance in a small hold-out set drawn from the training distribution mitigate catastrophic forgetting.
翻译:关键词检测的现代方法依赖于在具有独立同分布的大规模静态数据集上训练深度神经网络。然而,当在现实应用中遇到变化的数据分布时,所得到的模型往往表现不佳。本文研究了一种简单但有效的在线持续学习方法,该方法通过在新数据可用时利用随机梯度下降在设备端更新关键词检测器。与以往研究不同,本文聚焦于学习相同关键词检测任务(涵盖了大多数商业应用场景)。在不同场景的动态音频流实验中,该方法将预训练的小型模型的性能提升了34%。此外,实验表明,与朴素在线学习实现相比,基于模型在从训练分布中抽取的小规模保留集上的性能进行条件更新,可以缓解灾难性遗忘。