In this paper, we present a statistical beamforming algorithm as a pre-processing step for robust automatic speech recognition (ASR). By modeling the target speech as a non-stationary Laplacian distribution, a mask-based statistical beamforming algorithm is proposed to exploit both its output and masked input variance for robust estimation of the beamformer. In addition, we also present a method for steering vector estimation (SVE) based on a noise power ratio obtained from the target and noise outputs in independent component analysis (ICA). To update the beamformer in the same ICA framework, we derive ICA with distortionless and null constraints on target speech, which yields beamformed speech at the target output and noises at the other outputs, respectively. The demixing weights for the target output result in a statistical beamformer with the weighted spatial covariance matrix (wSCM) using a weighting function characterized by a source model. To enhance the SVE, the strict null constraints imposed by the Lagrange multiplier methods are relaxed by generalized penalties with weight parameters, while the strict distortionless constraints are maintained. Furthermore, we derive an online algorithm based on an optimization technique of recursive least squares (RLS) for practical applications. Experimental results on various environments using CHiME-4 and LibriCSS datasets demonstrate the effectiveness of the presented algorithm compared to conventional beamforming and blind source extraction (BSE) based on ICA on both batch and online processing.
翻译:本文提出了一种统计波束形成算法,作为鲁棒自动语音识别(ASR)的预处理步骤。通过将目标语音建模为非平稳拉普拉斯分布,提出了一种基于掩码的统计波束形成算法,利用其输出和掩码输入方差实现波束形成器的鲁棒估计。此外,我们还提出了一种基于独立成分分析(ICA)中目标与噪声输出噪声功率比的导向矢量估计(SVE)方法。为在同一ICA框架下更新波束形成器,我们推导出带无失真约束和零约束的ICA,分别得到目标输出处的波束形成语音和其他输出处的噪声。目标输出的解混权重构成了基于加权空间协方差矩阵(wSCM)的统计波束形成器,该矩阵采用由源模型表征的加权函数。为增强SVE,通过带权重参数的广义惩罚项放宽了拉格朗日乘子法施加的严格零约束,同时保持严格的无失真约束。此外,我们基于递归最小二乘(RLS)优化技术推导出适用于实际应用的在线算法。在CHiME-4和LibriCSS数据集上多种环境下的实验结果表明,与基于ICA的传统波束形成和盲源提取(BSE)相比,所提出的算法在批处理和在线处理中均具有有效性。