Neural Target Speech Extraction: An Overview

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail-party effect. For decades, researchers have focused on approaching the listening ability of humans. One critical issue is handling interfering speakers because the target and non-target speech signals share similar characteristics, complicating their discrimination. Target speech/speaker extraction (TSE) isolates the speech signal of a target speaker from a mixture of several speakers with or without noises and reverberations using clues that identify the speaker in the mixture. Such clues might be a spatial clue indicating the direction of the target speaker, a video of the speaker's lips, or a pre-recorded enrollment utterance from which their voice characteristics can be derived. TSE is an emerging field of research that has received increased attention in recent years because it offers a practical approach to the cocktail-party problem and involves such aspects of signal processing as audio, visual, array processing, and deep learning. This paper focuses on recent neural-based approaches and presents an in-depth overview of TSE. We guide readers through the different major approaches, emphasizing the similarities among frameworks and discussing potential future directions.

翻译：人类即使在存在噪声、混响和干扰说话者的复杂声学环境中，也能专注于目标说话者。这种现象被称为鸡尾酒会效应。数十年来，研究者一直致力于接近人类的听觉能力。其中一个关键问题是处理干扰说话者，因为目标语音与非目标语音具有相似特征，难以区分。目标语音/说话者提取（TSE）利用识别混合语音中目标说话者的线索，从多个说话者的混合信号中分离出目标说话者的语音信号，其中可能包含或不包含噪声和混响。这些线索可能是指示目标说话者方向的空间线索、说话者嘴唇的视频，或预先录制的注册语音，从中可提取其声音特征。TSE是一个新兴研究领域，近年来受到越来越多关注，因为它为解决鸡尾酒会问题提供了实用方法，并涉及信号处理中的音频、视频、阵列处理及深度学习等多个方面。本文聚焦于近期基于神经网络的TSE方法，对该领域进行了深入概述。我们引导读者了解不同主要方法，强调各框架间的相似性，并探讨了未来潜在的研究方向。

相关内容

TSE

关注 0

IEEE软件工程事务处理对定义明确的理论结果和对软件的构建、分析或管理有潜在影响的实证研究感兴趣。这些交易的范围从制定原则的机制到将这些原则应用到具体环境。具体的主题领域包括：a）开发和维护方法和模型，例如软件系统的规范、设计和实现的技术和原则，包括符号和过程模型；b）评估方法，例如软件测试和验证、可靠性模型、测试和诊断程序，用于错误控制的软件冗余和设计，以及过程和产品各个方面的测量和评估；c）软件项目管理，例如生产力因素、成本模型、进度和组织问题、标准；d）工具和环境，例如特定工具，集成工具环境，包括相关的体系结构、数据库、并行和分布式处理问题；e）系统问题，例如硬件-软件权衡；f）最新调查，提供对某一特定关注领域历史发展的综合和全面审查。官网地址：http://dblp.uni-trier.de/db/journals/tse/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日