Deep learning based techniques have been popularly adopted in acoustic echo cancellation (AEC). Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC). Meanwhile, task-decoupling strategies are widely adopted in speech enhancement. To further explore the task-decoupling approach, we propose to use a two-stage task-decoupling post-filter (TDPF) in PAEC. Furthermore, a multi-scale local-global speaker representation is applied to improve speaker extraction in PAEC. Experimental results indicate that the task-decoupling model can yield better performance than a single joint network. The optimal approach is to decouple the echo cancellation from noise and interference speech suppression. Based on the task-decoupling sequence, optimal training strategies for the two-stage model are explored afterwards.
翻译:基于深度学习的声学回声消除技术已被广泛采用。说话人表征的运用拓展了声学回声消除的边界,吸引了众多研究者对个性化声学回声消除的兴趣。与此同时,任务解耦策略在语音增强领域也得到了广泛应用。为进一步探索任务解耦方法,本文提出在个性化声学回声消除中采用两阶段任务解耦后置滤波器。此外,本文应用多尺度局部-全局说话人表征以改进个性化声学回声消除中的说话人提取。实验结果表明,任务解耦模型相比单一联合网络能取得更优性能。最优方案是将回声消除与噪声及干扰语音抑制进行解耦。基于任务解耦序列,后续探讨了两阶段模型的最优训练策略。