This paper presents a novel approach to sound source separation that leverages spatial information obtained during the recording setup. Our method trains a spatial mixing filter using solo passages to capture information about the room impulse response and transducer response at each sensor location. This pre-trained filter is then integrated into a multichannel non-negative matrix factorization (MNMF) scheme to better capture the variances of different sound sources. The recording setup used in our experiments is the typical setup for orchestra recordings, with a main microphone and a close "cardioid" or "supercardioid" microphone for each section of the orchestra. This makes the proposed method applicable to many existing recordings. Experiments on polyphonic ensembles demonstrate the effectiveness of the proposed framework in separating individual sound sources, improving performance compared to conventional MNMF methods.
翻译:本文提出了一种新颖的声音源分离方法,该方法利用录音设置期间获取的空间信息。我们的方法通过使用独奏段落训练空间混合滤波器,以捕获每个传感器位置的房间脉冲响应和换能器响应信息。然后将这种预训练滤波器集成到多通道非负矩阵分解(MNMF)方案中,以更好地捕捉不同声源的方差。实验使用的录音设置是管弦乐录音的典型设置,包括一个主麦克风,以及管弦乐每个声部的近距离"心形"或"超心形"麦克风。这使得所提出的方法可适用于许多现有录音。在多声部合奏实验上,所提出的框架在分离单个声源方面展示了有效性,相比传统MNMF方法提升了性能。