Temporal action detection aims to recognize the action category and determine the starting and ending time of each action instance in untrimmed videos. The mixed methods have achieved remarkable performance by simply merging anchor-based and anchor-free approaches. However, there are still two crucial issues in the mixed framework: (1) Brute-force merging and handcrafted anchors design affect the performance and practical application of the mixed methods. (2) A large number of false positives in action category predictions further impact the detection performance. In this paper, we propose a novel Boundary Discretization and Reliable Classification Network (BDRC-Net) that addresses the above issues by introducing boundary discretization and reliable classification modules. Specifically, the boundary discretization module (BDM) elegantly merges anchor-based and anchor-free approaches in the form of boundary discretization, avoiding the handcrafted anchors design required by traditional mixed methods. Furthermore, the reliable classification module (RCM) predicts reliable action categories to reduce false positives in action category predictions. Extensive experiments conducted on different benchmarks demonstrate that our proposed method achieves favorable performance compared with the state-of-the-art. For example, BDRC-Net hits an average mAP of 68.6% on THUMOS'14, outperforming the previous best by 1.5%. The code will be released at https://github.com/zhenyingfang/BDRC-Net.
翻译:时序动作检测旨在从非剪辑视频中识别动作类别并确定每个动作实例的起始与结束时刻。混合方法通过简单融合基于锚框(anchor-based)与无锚框(anchor-free)方法取得了显著性能。然而,混合框架中仍存在两个关键问题:(1)暴力融合与人工锚框设计影响混合方法的性能及实际应用;(2)动作类别预测中存在大量误报,进一步影响检测性能。本文提出一种新颖的边界离散化与可靠分类网络(BDRC-Net),通过引入边界离散化与可靠分类模块解决上述问题。具体而言,边界离散化模块(BDM)以边界离散化形式优雅地融合基于锚框与无锚框方法,避免了传统混合方法所需的人工锚框设计。此外,可靠分类模块(RCM)预测可靠的动作类别以减少动作类别预测中的误报。在不同基准数据集上的大量实验表明,所提方法相比现有最优方法取得了优越性能。例如,BDRC-Net在THUMOS'14数据集上达到68.6%的平均mAP,较此前最佳结果提升1.5%。代码将发布于https://github.com/zhenyingfang/BDRC-Net。