DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task. Due to large-scale datasets, most existing methods use a network pretrained in other datasets to extract features, which are not suitable enough for WTAL. To address this problem, researchers design several modules for feature enhancement, which improve the performance of the localization module, especially modeling the temporal relationship between snippets. However, all of them neglect the adverse effects of ambiguous information, which would reduce the discriminability of others. Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations. Additionally, we propose feature consistency loss to prevent the assimilation of features and drive the graph convolution network to generate more discriminative representations. Extensive experiments on THUMOS14 and ActivityNet1.2 benchmarks demonstrate the effectiveness of DDG-Net, establishing new state-of-the-art results on both datasets. Source code is available at \url{https://github.com/XiaojunTang22/ICCV2023-DDGNet}.

翻译：弱监督时序动作定位（WTAL）是一项实用但具有挑战性的任务。由于大规模数据集的存在，现有大多数方法使用在其他数据集上预训练的网络提取特征，这些特征并不完全适用于WTAL。为解决此问题，研究人员设计了多个特征增强模块，通过建模片段间的时序关系来提升定位模块的性能。然而，这些方法均忽略了模糊信息对其它片段可鉴别性造成的负面影响。针对这一现象，本文提出可鉴别性驱动图网络（DDG-Net），通过精心设计的连接显式建模模糊片段与可鉴别片段，既阻止模糊信息的传播，又增强片段级表示的可鉴别性。此外，我们提出特征一致性损失以抑制特征的同化作用，并驱动图卷积网络生成更具可鉴别性的表示。在THUMOS14与ActivityNet1.2基准上的大量实验表明，DDG-Net的有效性，并在两个数据集上均取得了新的最佳结果。源代码发布于\url{https://github.com/XiaojunTang22/ICCV2023-DDGNet}。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日