This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the number of speakers and the acoustic transfer functions (ATFs) from the speakers to the microphones. When the mixture attributes are unavailable, estimating them by low-latency processing is challenging, hindering the application of the BFs to the problem. In this paper, we overcome this problem by modifying a conventional Parametric Multichannel Wiener Filter (PMWF). The proposed Mod-PMWF can adaptively form a directivity pattern that enhances all the speakers in the mixture without explicitly estimating these attributes. Our experiments will show the proposed BF's effectiveness in interference reduction ratios and subjective listening tests.
翻译:本文提出一种新颖的低延迟在线波束成形算法——改进参数化多通道维纳滤波器(Mod-PMWF),用于增强说话人数量未知且动态变化的语音混合信号。尽管传统波束成形方法(如线性约束最小方差波束成形器)能够增强语音混合信号,但它们通常需要语音混合信号的相关属性,例如说话人数量及说话人至麦克风的声学传递函数。当混合信号属性未知时,通过低延迟处理对其进行估计极具挑战性,这制约了波束成形器在该问题中的应用。针对这一难题,本文通过对传统参数化多通道维纳滤波器进行改进,提出了Mod-PMWF算法。该算法能够自适应形成指向性模式,在无需显式估计上述属性的前提下增强混合信号中的所有说话人。实验将通过干扰抑制比和主观听音测试验证所提波束成形方法的有效性。