Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization - 专知论文

会员服务 ·

0

可逆 · RE · 预训练 · 视频 · 端到端 ·

2023 年 3 月 28 日

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

翻译：Re²TAL：面向可逆时序动作定位的预训练视频骨干网络重连

Chen Zhao,Shuming Liu,Karttikeya Mangalam,Bernard Ghanem

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given limited GPU memory, training TAL end to end (i.e., from videos to predictions) on long videos is a significant challenge. Most methods can only train on pre-extracted features without optimizing them for the localization problem, consequently limiting localization performance. In this work, to extend the potential in TAL networks, we propose a novel end-to-end method Re2TAL, which rewires pretrained video backbones for reversible TAL. Re2TAL builds a backbone with reversible modules, where the input can be recovered from the output such that the bulky intermediate activations can be cleared from memory during training. Instead of designing one single type of reversible module, we propose a network rewiring mechanism, to transform any module with a residual connection to a reversible module without changing any parameters. This provides two benefits: (1) a large variety of reversible networks are easily obtained from existing and even future model designs, and (2) the reversible models require much less training effort as they reuse the pre-trained parameters of their original non-reversible versions. Re2TAL, only using the RGB modality, reaches 37.01% average mAP on ActivityNet-v1.3, a new state-of-the-art record, and mAP 64.9% at tIoU=0.5 on THUMOS-14, outperforming all other RGB-only methods.

翻译：时序动作定位（TAL）需要长程推理以预测不同时长和复杂内容的动作。在GPU内存有限的情况下，对长视频进行端到端（即从视频到预测）的TAL训练是一项重大挑战。大多数方法只能基于预提取特征进行训练，而无法对这些特征针对定位问题进行优化，从而限制了定位性能。为拓展TAL网络的潜力，本文提出一种新型端到端方法Re²TAL，通过重连预训练视频骨干网络实现可逆TAL。Re²TAL构建了包含可逆模块的骨干网络，其输入可从输出中恢复，从而在训练期间清除内存中的庞大中间激活值。我们并非设计单一类型的可逆模块，而是提出一种网络重连机制，可将任意具有残差连接的模块改造为可逆模块，且无需修改任何参数。这带来两大优势：（1）可从现有甚至未来的模型设计中轻松获得种类丰富的可逆网络；（2）可逆模型因复用其原始不可逆版本的预训练参数，所需训练代价大幅降低。仅使用RGB模态的Re²TAL在ActivityNet-v1.3上达到37.01%的平均mAP，创下新纪录；在THUMOS-14上于tIoU=0.5时获得mAP 64.9%，优于所有其他仅使用RGB的方法。

0

相关内容

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ECCV2020】EfficientFCN：语义分割中的整体引导解码器

【ECCV2020】EfficientFCN：语义分割中的整体引导解码器

专知会员服务

18+阅读 · 2020年8月23日

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

专知会员服务

43+阅读 · 2020年4月1日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

泡泡机器人SLAM

12+阅读 · 2019年2月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

专知

32+阅读 · 2018年8月14日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

基于单测点信号的桥梁损伤诊断与定位的混沌振子及重构相空间方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏多元传感器网络跟踪用电设备状态的序列解码与部署优化关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

时滞细胞神经网络的分支，混沌与同步控制

国家自然科学基金

0+阅读 · 2012年12月31日

分数阶系统的脉冲混沌动力学及稳定性

国家自然科学基金

0+阅读 · 2012年12月31日

顺序激活RARα及TrkB受体介导信号的协同机制对成体急性脊髓损伤神经再生修复的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

仿人机器人同时定位和三维认知地图创建研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于纳米带场效应晶体管的ppb量级高灵敏SO2半导体传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rayleigh信道统计分析和建模

国家自然科学基金

0+阅读 · 2009年12月31日

连续变量量子纠缠态存储

国家自然科学基金

0+阅读 · 2009年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月18日

Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams

Arxiv

0+阅读 · 2023年5月18日

Direct Visual Servoing Based on Discrete Orthogonal Moments

Arxiv

0+阅读 · 2023年5月18日

MAST: Multiscale Audio Spectrogram Transformers

Arxiv

0+阅读 · 2023年5月18日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

102+阅读 · 2022年5月11日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Survey: Transformer based Video-Language Pre-training

Arxiv

20+阅读 · 2021年9月21日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

3+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

5+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ECCV2020】EfficientFCN：语义分割中的整体引导解码器

【ECCV2020】EfficientFCN：语义分割中的整体引导解码器

专知会员服务

18+阅读 · 2020年8月23日

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

专知会员服务

43+阅读 · 2020年4月1日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

泡泡机器人SLAM

12+阅读 · 2019年2月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

专知

32+阅读 · 2018年8月14日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

相关论文

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月18日

Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams

Arxiv

0+阅读 · 2023年5月18日

Direct Visual Servoing Based on Discrete Orthogonal Moments

Arxiv

0+阅读 · 2023年5月18日

MAST: Multiscale Audio Spectrogram Transformers

Arxiv

0+阅读 · 2023年5月18日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

102+阅读 · 2022年5月11日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Survey: Transformer based Video-Language Pre-training

Arxiv

20+阅读 · 2021年9月21日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

基于单测点信号的桥梁损伤诊断与定位的混沌振子及重构相空间方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏多元传感器网络跟踪用电设备状态的序列解码与部署优化关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

时滞细胞神经网络的分支，混沌与同步控制

国家自然科学基金

0+阅读 · 2012年12月31日

分数阶系统的脉冲混沌动力学及稳定性

国家自然科学基金

0+阅读 · 2012年12月31日

顺序激活RARα及TrkB受体介导信号的协同机制对成体急性脊髓损伤神经再生修复的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

仿人机器人同时定位和三维认知地图创建研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于纳米带场效应晶体管的ppb量级高灵敏SO2半导体传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rayleigh信道统计分析和建模

国家自然科学基金

0+阅读 · 2009年12月31日

连续变量量子纠缠态存储

国家自然科学基金

0+阅读 · 2009年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员