Sound Event Localization and Detection (SELD) combines the Sound Event Detection (SED) with the corresponding Direction Of Arrival (DOA). Recently, adopted event oriented multi-track methods affect the generality in polyphonic environments due to the limitation of the number of tracks. To enhance the generality in polyphonic environments, we propose Spatial Mapping and Regression Localization for SELD (SMRL-SELD). SMRL-SELD segments the 3D spatial space, mapping it to a 2D plane, and a new regression localization loss is proposed to help the results converge toward the location of the corresponding event. SMRL-SELD is location-oriented, allowing the model to learn event features based on orientation. Thus, the method enables the model to process polyphonic sounds regardless of the number of overlapping events. We conducted experiments on STARSS23 and STARSS22 datasets and our proposed SMRL-SELD outperforms the existing SELD methods in overall evaluation and polyphony environments.
翻译:声音事件定位与检测(SELD)将声音事件检测(SED)与相应的到达方向(DOA)估计相结合。近期采用的事件导向多轨方法因轨道数量限制,在多声源混叠环境中影响了泛化性能。为提升多声源环境下的泛化能力,本文提出面向SELD的空间映射与回归定位方法(SMRL-SELD)。该方法将三维空间分割并映射至二维平面,同时提出新的回归定位损失函数,使结果收敛至对应事件的实际位置。SMRL-SELD以位置为导向,使模型能够基于方位学习事件特征。因此,该方法使模型能够处理多声源混叠信号,且不受重叠事件数量的限制。我们在STARSS23和STARSS22数据集上进行了实验,结果表明所提出的SMRL-SELD在整体评估和多声源场景下均优于现有SELD方法。