Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.
翻译:声音事件定位与检测(SELD)是机器听觉领域的一项重要任务。其重大进展依赖于在特定房间内包含声音事件及强时空标签的仿真数据。SELD数据是通过将空间定位的房间脉冲响应(RIRs)与声音波形进行卷积以在声景中放置声音事件来生成的。然而,RIRs需要在特定房间中人工采集。我们提出SpatialScaper,一个用于SELD数据仿真与增强的库。与现有工具相比,SpatialScaper通过尺寸、墙壁吸声系数等参数模拟虚拟房间。这使得可以对前景和背景声源进行参数化放置(包括移动)。SpatialScaper还包含可应用于现有SELD数据的数据增强流水线。作为案例研究,我们使用SpatialScaper为DCASE SELD数据添加房间声学特性。使用我们的数据训练模型时,性能提升与声学多样性呈直接函数关系。这些结果表明SpatialScaper对于训练鲁棒的SELD模型具有重要价值。