Adaptive filters (AFs) are vital for enhancing the performance of downstream tasks, such as speech recognition, sound event detection, and keyword spotting. However, traditional AF design prioritizes isolated signal-level objectives, often overlooking downstream task performance. This can lead to suboptimal performance. Recent research has leveraged meta-learning to automatically learn AF update rules from data, alleviating the need for manual tuning when using simple signal-level objectives. This paper improves the Meta-AF framework by expanding it to support end-to-end training for arbitrary downstream tasks. We focus on classification tasks, where we introduce a novel training methodology that harnesses self-supervision and classifier feedback. We evaluate our approach on the combined task of acoustic echo cancellation and keyword spotting. Our findings demonstrate consistent performance improvements with both pre-trained and joint-trained keyword spotting models across synthetic and real playback. Notably, these improvements come without requiring additional tuning, increased inference-time complexity, or reliance on oracle signal-level training data.
翻译:自适应滤波器对于提升语音识别、声音事件检测和关键词识别等下游任务性能至关重要。然而,传统自适应滤波器设计优先考虑孤立的信号级目标,往往忽视下游任务性能,这可能导致次优表现。最新研究利用元学习从数据中自动学习自适应滤波器更新规则,避免了在采用简单信号级目标时的手动调参需求。本文对Meta-AF框架进行改进,将其扩展为支持任意下游任务的端到端训练。我们聚焦于分类任务,提出了一种利用自监督和分类器反馈的新型训练方法。我们在声学回声消除与关键词识别联合任务上评估了该方法。实验结果表明,无论采用预训练还是联合训练的关键词识别模型,在合成和真实回放场景下均能获得一致的性能提升。值得注意的是,这些改进无需额外调参、不增加推理复杂度,也不依赖先验信号级训练数据。