DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task. Due to large-scale datasets, most existing methods use a network pretrained in other datasets to extract features, which are not suitable enough for WTAL. To address this problem, researchers design several modules for feature enhancement, which improve the performance of the localization module, especially modeling the temporal relationship between snippets. However, all of them neglect the adverse effects of ambiguous information, which would reduce the discriminability of others. Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations. Additionally, we propose feature consistency loss to prevent the assimilation of features and drive the graph convolution network to generate more discriminative representations. Extensive experiments on THUMOS14 and ActivityNet1.2 benchmarks demonstrate the effectiveness of DDG-Net, establishing new state-of-the-art results on both datasets. Source code is available at \url{https://github.com/XiaojunTang22/ICCV2023-DDGNet}.

翻译：弱监督时序动作定位（WTAL）是一项实用但具有挑战性的任务。由于大规模数据集的存在，现有方法大多使用在其他数据集上预训练的网络提取特征，这些特征对WTAL不够适配。针对该问题，研究者设计了若干特征增强模块，通过建模片段间时序关系提升了定位模块的性能。然而，现有方法均忽略了模糊信息产生的负面影响——这类信息会降低其他特征的判别性。基于这一发现，我们提出判别性驱动图网络（DDG-Net），通过精心设计的连接显式建模模糊片段与判别性片段，从而阻断模糊信息的传播并增强片段级表示的判别性。此外，我们提出特征一致性损失以抑制特征同化，促使图卷积网络生成更具判别性的表示。在THUMOS14和ActivityNet1.2基准上的大量实验验证了DDG-Net的有效性，其在两个数据集上均取得了新的最优结果。源代码已开源至\url{https://github.com/XiaojunTang22/ICCV2023-DDGNet}。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日