We propose a novel neural network-based end-to-end acoustic echo cancellation (E2E-AEC) method capable of streaming inference, which operates effectively without reliance on traditional linear AEC (LAEC) techniques and time delay estimation. Our approach includes several key strategies: First, we introduce and refine progressive learning to gradually enhance echo suppression. Second, our model employs knowledge transfer by initializing with a pre-trained LAECbased model, harnessing the insights gained from LAEC training. Third, we optimize the attention mechanism with a loss function applied on attention weights to achieve precise time alignment between the reference and microphone signals. Lastly, we incorporate voice activity detection to enhance speech quality and improve echo removal by masking the network output when near-end speech is absent. The effectiveness of our approach is validated through experiments conducted on public datasets.
翻译:我们提出了一种新颖的基于神经网络的端到端声学回声消除方法,该方法能够进行流式推理,且无需依赖传统的线性声学回声消除技术或时间延迟估计即可有效运行。我们的方法包含几项关键策略:首先,我们引入并改进了渐进式学习,以逐步增强回声抑制能力。其次,我们的模型通过使用基于预训练线性声学回声消除模型进行初始化,实现了知识迁移,从而利用了从线性声学回声消除训练中获得的经验。第三,我们通过在注意力权重上应用损失函数来优化注意力机制,以实现参考信号与麦克风信号之间的精确时间对齐。最后,我们引入了语音活动检测,通过在近端语音缺失时对网络输出进行掩码处理,以提升语音质量并改善回声消除效果。我们在公开数据集上进行的实验验证了该方法的有效性。