Neural networks have recently become the dominant approach to sound separation. Their good performance relies on large datasets of isolated recordings. For speech and music, isolated single channel data are readily available; however the same does not hold in the multi-channel case, and with most other sound classes. Multi-channel methods have the potential to outperform single channel approaches as they can exploit both spatial and spectral features, but the lack of training data remains a challenge. We propose unsupervised improved minimum variation distortionless response (UIMVDR), which enables multi-channel separation to leverage in-the-wild single-channel data through unsupervised training and beamforming. Results show that UIMVDR generalizes well and improves separation performance compared to supervised models, particularly in cases with limited supervised data. By using data available online, it also reduces the effort required to gather data for multi-channel approaches.
翻译:近年来,神经网络已成为声音分离领域的主流方法。其优异性能依赖于大规模孤立录音数据集。对于语音和音乐而言,单通道孤立数据易于获取;然而在多通道场景及大多数其他声音类别中,情况并非如此。多通道方法因能同时利用空间与频谱特征而具备超越单通道方法的潜力,但训练数据的缺乏仍是主要挑战。本文提出无监督改进型最小方差无失真响应(UIMVDR)方法,通过无监督训练与波束成形技术,使多通道分离能够利用真实场景中的单通道数据。实验结果表明,与监督模型相比,UIMVDR展现出良好的泛化能力并提升了分离性能,在监督数据有限的情况下尤为显著。该方法通过利用在线可用数据,同时降低了多通道方法所需的数据收集成本。