Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression. SMBG features a multi-level boundary module that enables faster processing by gathering boundary information at different lengths. Additionally, we introduce a sparse extraction confidence head that distinguishes information inside and outside the action, further optimizing the proposal feature generator. To improve the synergy between multiple branches and balance positive and negative samples, we propose a global guidance loss. Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG). These results demonstrate that SMBG provides a more efficient and simple solution for generating temporal action proposals. Our proposed framework has the potential to advance the field of computer vision and enhance the accuracy and speed of temporal action localization in video analysis.The code and models are made available at \url{https://github.com/zhouyang-001/SMBG-for-temporal-action-proposal}.
翻译:视频中的时序动作定位是计算机视觉领域的重要挑战。虽然边界敏感方法已被广泛采用,但其局限性包括对中间及全局信息利用不充分,以及提案特征生成器效率低下。为解决这些问题,我们提出了一种新颖框架——稀疏多层级边界生成器(SMBG),该方法通过边界分类与动作完整性回归增强了边界敏感方法。SMBG采用多层级边界模块,通过收集不同长度下的边界信息实现更快的处理速度。此外,我们引入稀疏提取置信度头以区分动作内部与外部信息,进一步优化提案特征生成器。为提升多分支间的协同性并平衡正负样本,我们提出全局引导损失函数。本方法在ActivityNet-1.3和THUMOS14两个主流基准数据集上进行了评估,结果表明其达到了最优性能,且推理速度更快(2.47倍于BSN++,2.12倍于DBG)。这些结果证明SMBG为生成时序动作提案提供了更高效简洁的解决方案。所提框架有望推动计算机视觉领域发展,提升视频分析中时序动作定位的精度与速度。相关代码与模型已开源至\url{https://github.com/zhouyang-001/SMBG-for-temporal-action-proposal}。