X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Target speech extraction (TSE) systems are designed to extract target speech from a multi-talker mixture. The popular training objective for most prior TSE networks is to enhance reconstruction performance of extracted speech waveform. However, it has been reported that a TSE system delivers high reconstruction performance may still suffer low-quality experience problems in practice. One such experience problem is wrong speaker extraction (called speaker confusion, SC), which leads to strong negative experience and hampers effective conversations. To mitigate the imperative SC issue, we reformulate the training objective and propose two novel loss schemes that explore the metric of reconstruction improvement performance defined at small chunk-level and leverage the metric associated distribution information. Both loss schemes aim to encourage a TSE network to pay attention to those SC chunks based on the said distribution information. On this basis, we present X-SepFormer, an end-to-end TSE model with proposed loss schemes and a backbone of SepFormer. Experimental results on the benchmark WSJ0-2mix dataset validate the effectiveness of our proposals, showing consistent improvements on SC errors (by 14.8% relative). Moreover, with SI-SDRi of 19.4 dB and PESQ of 3.81, our best system significantly outperforms the current SOTA systems and offers the top TSE results reported till date on the WSJ0-2mix.

翻译：目标语音提取（TSE）系统旨在从多说话人混合音频中提取目标语音。以往多数TSE网络常用的训练目标是提升提取语音波形的重建性能。然而，已有研究表明，具有高重建性能的TSE系统在实际应用中仍可能面临低质量体验问题。其中一种体验问题是错误说话人提取（即说话人混淆，SC），这会导致强烈的负面体验并妨碍有效对话。为缓解这一迫切的SC问题，我们重新设计了训练目标，提出了两种新型损失方案，分别探索以小片段级定义的重建改进性能度量，并利用与该度量相关的分布信息。两种损失方案均旨在引导TSE网络基于上述分布信息关注那些发生SC的片段。在此基础上，我们提出了X-SepFormer，一种端到端TSE模型，结合了所提出的损失方案与SepFormer骨干网络。在标准WSJ0-2mix数据集上的实验结果验证了我们方法的有效性，在SC错误率上实现了14.8%的相对一致改善。此外，我们的最优系统以19.4 dB的SI-SDRi和3.81的PESQ显著优于当前最先进系统，并在WSJ0-2mix数据集上提供了截至目前公开报道的最佳TSE结果。

相关内容

TSE

关注 0

IEEE软件工程事务处理对定义明确的理论结果和对软件的构建、分析或管理有潜在影响的实证研究感兴趣。这些交易的范围从制定原则的机制到将这些原则应用到具体环境。具体的主题领域包括：a）开发和维护方法和模型，例如软件系统的规范、设计和实现的技术和原则，包括符号和过程模型；b）评估方法，例如软件测试和验证、可靠性模型、测试和诊断程序，用于错误控制的软件冗余和设计，以及过程和产品各个方面的测量和评估；c）软件项目管理，例如生产力因素、成本模型、进度和组织问题、标准；d）工具和环境，例如特定工具，集成工具环境，包括相关的体系结构、数据库、并行和分布式处理问题；e）系统问题，例如硬件-软件权衡；f）最新调查，提供对某一特定关注领域历史发展的综合和全面审查。官网地址：http://dblp.uni-trier.de/db/journals/tse/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日