We aim to perform sound event localization and detection (SELD) using wearable equipment for a moving human, such as a pedestrian. Conventional SELD tasks have dealt only with microphone arrays located in static positions. However, self-motion with three rotational and three translational degrees of freedom (6DoF) shall be considered for wearable microphone arrays. A system trained only with a dataset using microphone arrays in a fixed position would be unable to adapt to the fast relative motion of sound events associated with self-motion, resulting in the degradation of SELD performance. To address this, we designed 6DoF SELD Dataset for wearable systems, the first SELD dataset considering the self-motion of microphones. Furthermore, we proposed a multi-modal SELD system that jointly utilizes audio and motion tracking sensor signals. These sensor signals are expected to help the system find useful acoustic cues for SELD on the basis of the current self-motion state. Experimental results on our dataset show that the proposed method effectively improves SELD performance with a mechanism to extract acoustic features conditioned by sensor signals.
翻译:本研究旨在利用可穿戴设备(如行人携带的装备)实现移动人体的声音事件定位与检测(SELD)。传统SELD任务仅处理固定位置的麦克风阵列,但可穿戴麦克风阵列需考虑具有三个旋转自由度和三个平移自由度的自运动(6DoF)。仅使用固定位置麦克风阵列数据集训练的系统无法适应自运动导致的声音事件快速相对运动,从而造成SELD性能下降。为解决这一问题,我们设计了首个考虑麦克风自运动的可穿戴系统SELD数据集——6DoF SELD数据集。此外,我们提出了一种联合利用音频与运动追踪传感器信号的多模态SELD系统,该系统可基于当前自运动状态辅助提取有效声学线索。数据集上的实验结果表明,所提方法通过设计传感器信号条件化的声学特征提取机制,有效提升了SELD性能。