Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction

This study introduces an online target sound extraction (TSE) process using the similarity-and-independence-aware beamformer (SIBF) derived from an iterative batch algorithm. The study aimed to reduce latency while maintaining extraction accuracy. The SIBF, which is a linear method, provides more accurate estimates of the target than an approximate magnitude spectrogram reference. The transition to an online algorithm reduces latency but presents challenges. First, contrary to the conventional assumption, deriving the online algorithm may degrade accuracy as compared to the batch algorithm using a sliding window. Second, conventional post-processing methods intended for scaling the estimated target may widen the accuracy gap between the two algorithms. This study adopts an approach that addresses these challenges and minimizes the accuracy gap during post-processing. It proposes a novel scaling method based on the single-channel Wiener filter (SWF-based scaling). To further improve accuracy, the study introduces a modified version of the time-frequency-varying variance generalized Gaussian distribution as a source model to represent the joint probability between the target and reference. Experimental results using the CHiME-3 dataset demonstrate several key findings: 1) SWF-based scaling effectively eliminates the gap between the two algorithms and improves accuracy. 2) The new source model achieves optimal accuracy, corresponding to the Laplacian model. 3) Our online SIBF outperforms conventional linear TSE methods, including independent vector extraction and minimum mean square error beamforming. These findings can contribute to the fields of beamforming and blind source separation.

翻译：本研究提出了一种基于迭代批处理算法推导的在线目标声音提取（TSE）方法，采用相似性与独立性感知波束成形器（SIBF）。研究旨在降低延迟的同时保持提取精度。SIBF作为一种线性方法，能比近似的幅度谱参考更准确地估计目标信号。向在线算法的转换虽降低了延迟，但也带来了挑战。首先，与传统假设相反，推导出的在线算法相较于使用滑动窗口的批处理算法可能导致精度下降。其次，传统用于缩放估计目标的后处理方法可能扩大两种算法间的精度差距。本研究采用一种应对这些挑战的方法，在后处理阶段最小化精度差距。它提出了一种基于单通道维纳滤波器的新型缩放方法（基于SWF的缩放）。为进一步提高精度，研究引入了一种改进的时频变化方差广义高斯分布作为源模型，以表示目标与参考之间的联合概率。使用CHiME-3数据集的实验结果展示了若干关键发现：1）基于SWF的缩放有效消除了两种算法间的差距并提高了精度；2）新源模型实现了最优精度，对应于拉普拉斯模型；3）我们的在线SIBF优于传统线性TSE方法，包括独立向量提取和最小均方误差波束成形。这些发现可为波束成形和盲源分离领域做出贡献。