Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.
翻译:时序检测问题广泛存在于时间序列估计、活动识别及声音事件检测(SED)等多个领域。本研究提出一种新的时序事件建模方法,通过显式建模事件的起始与偏移,并引入边界感知优化与推理策略,显著提升时序事件检测性能。所提方法整合了新型时序建模层——循环事件检测(RED)与事件提议网络(EPN),结合定制化损失函数,实现了更高效、更精准的时序事件检测。我们在音频数据集AudioSet中具有强时间标注的子集上对SED领域所提方法进行评估。实验结果表明,该方法不仅以先进的后期处理技术超越传统逐帧SED模型,更无需后处理超参数调优,即可在所有AudioSet强标签类别上实现新的最优性能标准。