Keyword Spotting (KWS) systems with small footprint models deployed on edge devices face significant accuracy and robustness challenges due to domain shifts caused by varying noise and recording conditions. To address this, we propose a comprehensive framework for continual learning designed to adapt to new domains while maintaining computational efficiency. The proposed pipeline integrates a dual-input Convolutional Neural Network, utilizing both Mel Frequency Cepstral Coefficients (MFCC) and Mel-spectrogram features, supported by a multi-stage denoising process, involving discrete wavelet transform and spectral subtraction techniques, plus model and prototype update blocks. Unlike prior methods that restrict updates to specific layers, our approach updates the complete quantized model, made possible due to compact model architecture. A subset of input samples are selected during runtime using class prototypes and confidence-driven filtering, which are then pseudo-labeled and combined with rehearsal buffer for incremental model retraining. Experimental results on noisy test dataset demonstrate the framework's effectiveness, achieving 99.63\% accuracy on clean data and maintaining robust performance (exceeding 94\% accuracy) across diverse noisy environments, even at -10 dB Signal-to-Noise Ratio. The proposed framework work confirms that integrating efficient denoising with prototype-based continual learning enables KWS models to operate autonomously and robustly in resource-constrained, dynamic environments.
翻译:在边缘设备上部署的小型关键词检测系统,由于噪声和录音条件变化引起的领域偏移,面临着显著的准确性和鲁棒性挑战。为解决这一问题,我们提出了一个全面的持续学习框架,旨在适应新领域的同时保持计算效率。该框架集成了一个双输入卷积神经网络,同时利用梅尔频率倒谱系数和梅尔频谱图特征,并辅以多阶段去噪处理(包括离散小波变换和谱减法技术),以及模型与原型更新模块。与先前仅更新特定层的方法不同,我们的方法更新完整的量化模型,这得益于紧凑的模型架构。在运行时,通过类别原型和置信度驱动过滤选择输入样本子集,随后进行伪标注并与回放缓冲区结合,用于增量模型重训练。在噪声测试数据集上的实验结果表明,该框架在干净数据上达到了99.63%的准确率,并在多样化的噪声环境中(即使在-10 dB信噪比下)保持了超过94%准确率的鲁棒性能。所提出的框架证实,将高效去噪与基于原型的持续学习相结合,能使关键词检测模型在资源受限的动态环境中自主且鲁棒地运行。