Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.
翻译:时序检测问题广泛存在于时间序列估计、活动识别及声音事件检测等多个领域。本文提出一种新的时序事件建模方法,通过显式建模事件起始点与终止点,并引入边界感知的优化与推理策略,显著提升了时序事件检测性能。该方法整合了新型时序建模层——循环事件检测网络与事件提议网络,结合定制化的损失函数,实现了更高效、更精确的时序事件检测。我们在声音事件检测领域使用AudioSet数据集中具有强时序标注的子集对所提方法进行评估。实验结果表明,该方法不仅优于采用最先进后处理的传统逐帧声音事件检测模型,而且无需进行后处理超参数调优,并在所有AudioSet强标注类别上实现了新的最优性能。