Existing methods utilizing spatial information for sound source separation require prior knowledge of the direction of arrival (DOA) of the source or utilize estimated but imprecise localization results, which impairs the separation performance, especially when the sound sources are moving. In fact, sound source localization and separation are interconnected problems, that is, sound source localization facilitates sound separation while sound separation contributes to refined source localization. This paper proposes a method utilizing the mutual facilitation mechanism between sound source localization and separation for moving sources. The proposed method comprises three stages. The first stage is initial tracking, which tracks each sound source from the audio mixture based on the source signal envelope estimation. These tracking results may lack sufficient accuracy. The second stage involves mutual facilitation: Sound separation is conducted using preliminary sound source tracking results. Subsequently, sound source tracking is performed on the separated signals, thereby refining the tracking precision. The refined trajectories further improve separation performance. This mutual facilitation process can be iterated multiple times. In the third stage, a neural beamformer estimates precise single-channel separation results based on the refined tracking trajectories and multi-channel separation outputs. Simulation experiments conducted under reverberant conditions and with moving sound sources demonstrate that the proposed method can achieve more accurate separation based on refined tracking results.
翻译:现有利用空间信息进行声源分离的方法需要预先知道声源的到达方向,或使用估计算法得到的精度有限的定位结果,这降低了分离性能,尤其是在声源移动的情况下。实际上,声源定位与分离是相互关联的问题,即声源定位有助于声音分离,而声音分离又能改善定位精度。本文提出了一种利用声源定位与分离之间相互促进机制的移动声源处理方法。该方法包含三个阶段。第一阶段是初始跟踪,基于声源信号包络估计从混合音频中追踪每个声源,但得到的跟踪结果可能精度不足。第二阶段涉及相互促进:利用初步的声源跟踪结果进行声音分离,随后对分离后的信号进行声源跟踪,从而提升跟踪精度,而更精确的轨迹进一步改善分离性能。这一相互促进过程可多次迭代。在第三阶段,基于精确的跟踪轨迹和多通道分离输出,神经波束形成器估计出精确的单通道分离结果。在混响环境及移动声源条件下的仿真实验表明,该方法能基于优化的跟踪结果实现更准确的声音分离。