In this paper, we present a statistical beamforming algorithm as a pre-processing step for robust automatic speech recognition (ASR). By modeling the target speech as a non-stationary Laplacian distribution, a mask-based statistical beamforming algorithm is proposed to exploit both its output and masked input variance for robust estimation of the beamformer. In addition, we also present a method for steering vector estimation (SVE) based on a noise power ratio obtained from the target and noise outputs in independent component analysis (ICA). To update the beamformer in the same ICA framework, we derive ICA with distortionless and null constraints on target speech, which yields beamformed speech at the target output and noises at the other outputs, respectively. The demixing weights for the target output result in a statistical beamformer with the weighted spatial covariance matrix (wSCM) using a weighting function characterized by a source model. To enhance the SVE, the strict null constraints imposed by the Lagrange multiplier methods are relaxed by generalized penalties with weight parameters, while the strict distortionless constraints are maintained. Furthermore, we derive an online algorithm based on an optimization technique of recursive least squares (RLS) for practical applications. Experimental results on various environments using CHiME-4 and LibriCSS datasets demonstrate the effectiveness of the presented algorithm compared to conventional beamforming and blind source extraction (BSE) based on ICA on both batch and online processing.
翻译:本文提出了一种统计波束形成算法,作为鲁棒自动语音识别(ASR)的预处理步骤。通过将目标语音建模为非平稳拉普拉斯分布,提出了一种基于掩码的统计波束形成算法,该算法同时利用其输出和掩码输入方差来实现波束形成器的鲁棒估计。此外,我们还提出了一种基于独立成分分析(ICA)中目标与噪声输出噪声功率比的导向矢量估计(SVE)方法。为了在相同的ICA框架中更新波束形成器,我们推导了具有目标语音无失真和零点的束条件的ICA,从而分别得到目标输出处的波束形成语音和其他输出处的噪声。目标输出的解混权重构成了一个统计波束形成器,该形成器使用由源模型表征的加权函数计算加权空间协方差矩阵(wSCM)。为了增强SVE,采用带权重参数的广义惩罚项放松了拉格朗日乘子法施加的严格零点约束,同时保持了严格的无失真约束。此外,我们基于递归最小二乘(RLS)优化技术推导了一种在线算法,适用于实际应用。在CHiME-4和LibriCSS数据集上多种环境下的实验结果表明,与传统的波束形成和基于ICA的盲源提取(BSE)相比,该算法在批量处理和在线处理中均具有有效性。