Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolution downsampling and recovery. This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary inference at a sub-snippet level. We further introduce an efficient Taylor-expansion based approximation, dubbed as Gaussian Approximated Post-processing (GAP). Extensive experiments demonstrate that our GAP can consistently improve a wide variety of pre-trained off-the-shelf TAD models on the challenging ActivityNet (+0.2% -0.7% in average mAP) and THUMOS (+0.2% -0.5% in average mAP) benchmarks. Such performance gains are already significant and highly comparable to those achieved by novel model designs. Also, GAP can be integrated with model training for further performance gain. Importantly, GAP enables lower temporal resolutions for more efficient inference, facilitating low-resource applications. The code will be available in https://github.com/sauradip/GAP
翻译:现有的时序动作检测(Temporal Action Detection, TAD)方法通常采取预处理步骤,将输入的变长视频转换为固定长度的片段表示序列,然后进行时序边界估计和动作分类。这一预处理步骤会对视频进行时序下采样,降低推理分辨率,并损害原始时间分辨率下的检测性能。本质上,这是由于分辨率下采样和恢复过程中引入的时序量化误差所致。该误差可能对TAD性能产生负面影响,但现有方法大多忽视了这一问题。为解决此问题,本文提出了一种新颖的、与模型无关的后处理方法,无需重新设计模型或重新训练。具体而言,我们通过高斯分布对动作实例的起始点和终止点进行建模,从而在子片段级别实现时序边界推断。我们还引入了一种基于泰勒展开的高效近似方法,称为高斯近似后处理(Gaussian Approximated Post-processing, GAP)。大量实验表明,我们的GAP能够持续改进多种预训练的现成TAD模型在具有挑战性的ActivityNet(平均mAP提升0.2%-0.7%)和THUMOS(平均mAP提升0.2%-0.5%)基准上的性能。这些性能提升已相当显著,且与新型模型设计所取得的提升高度可比。此外,GAP可与模型训练相结合以进一步提升性能。重要的是,GAP能够支持更低的时间分辨率从而实现更高效的推理,促进低资源应用。代码将公布于https://github.com/sauradip/GAP。