STMixer: A One-Stage Sparse Action Detector - 专知论文

会员服务 ·

0

动作检测 · 检测器 · 稀疏 · 解码 · 人体检测 ·

2023 年 3 月 28 日

STMixer: A One-Stage Sparse Action Detector

翻译：STMixer: 一种单阶段稀疏动作检测器

Tao Wu,Mengqi Cao,Ziteng Gao,Gangshan Wu,Limin Wang

from arxiv, Accepted by CVPR 2023

Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from the entire spatiotemporal domain. Second, we devise a dual-branch feature mixing module, which allows our STMixer to dynamically attend to and mix video features along the spatial and the temporal dimension respectively for better feature decoding. Coupling these two designs with a video backbone yields an efficient end-to-end action detector. Without bells and whistles, our STMixer obtains the state-of-the-art results on the datasets of AVA, UCF101-24, and JHMDB.

翻译：传统视频动作检测器通常采用两阶段流水线，即先使用人物检测器生成演员框，再通过3D RoIAlign提取演员专属特征进行分类。这种检测范式需要多阶段训练与推理，且无法捕获边界框之外的上下文信息。近期，少数基于查询的动作检测器被提出，能够以端到端的方式预测动作实例。然而，这些方法在特征采样与解码方面仍缺乏适应性，从而导致性能欠佳或收敛速度缓慢的问题。本文提出一种新型单阶段稀疏动作检测器，命名为STMixer。STMixer基于两大核心设计：首先，我们提出基于查询的自适应特征采样模块，赋予STMixer从整个时空域中挖掘一组判别性特征的灵活性；其次，我们设计了双分支特征混合模块，使STMixer能够分别沿空间与时间维度动态关注并混合视频特征，从而实现更优的特征解码。将这两项设计与视频骨干网络耦合，即构成高效的端到端动作检测器。无需额外复杂机制，我们的STMixer在AVA、UCF101-24和JHMDB数据集上均取得了最先进的结果。

0

相关内容

动作检测

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

专知会员服务

11+阅读 · 2022年4月10日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

专知会员服务

24+阅读 · 2022年2月6日

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

专知会员服务

13+阅读 · 2021年12月31日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

机器之心

1+阅读 · 2022年11月21日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

视频目标检测：Flow-based

视频目标检测：Flow-based

极市平台

22+阅读 · 2019年5月27日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019 | Stereo R-CNN 3D 目标检测

CVPR2019 | Stereo R-CNN 3D 目标检测

极市平台

27+阅读 · 2019年3月10日

ECCV2018目标检测（object detection）算法总览（部分含代码）

ECCV2018目标检测（object detection）算法总览（部分含代码）

极市平台

30+阅读 · 2018年12月29日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

基于子空间分析的低复杂度联合稀疏恢复方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

滤波器组格型结构自由参数的初始化研究

国家自然科学基金

0+阅读 · 2013年12月31日

部分遮挡下人脸人耳融合识别方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

压缩感知域高光谱图像稀疏解混方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

众核体系架构并行计算模型与算法自适应调优框架研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于子空间变换的机载宽带杂波抑制与扩展目标检测方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

XFormer: Fast and Accurate Monocular 3D Body Capture

Arxiv

0+阅读 · 2023年5月18日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Arxiv

21+阅读 · 2018年1月12日

VIP会员

文章信息

相关主题

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

10+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

9+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

3+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

5+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

7+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

5+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

7+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

8+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

7+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

9+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

9+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

15+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

8+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

专知会员服务

11+阅读 · 2022年4月10日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

专知会员服务

24+阅读 · 2022年2月6日

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

专知会员服务

13+阅读 · 2021年12月31日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

机器之心

1+阅读 · 2022年11月21日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

视频目标检测：Flow-based

视频目标检测：Flow-based

极市平台

22+阅读 · 2019年5月27日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019 | Stereo R-CNN 3D 目标检测

CVPR2019 | Stereo R-CNN 3D 目标检测

极市平台

27+阅读 · 2019年3月10日

ECCV2018目标检测（object detection）算法总览（部分含代码）

ECCV2018目标检测（object detection）算法总览（部分含代码）

极市平台

30+阅读 · 2018年12月29日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

相关论文

XFormer: Fast and Accurate Monocular 3D Body Capture

Arxiv

0+阅读 · 2023年5月18日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Arxiv

21+阅读 · 2018年1月12日

相关基金

基于子空间分析的低复杂度联合稀疏恢复方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

滤波器组格型结构自由参数的初始化研究

国家自然科学基金

0+阅读 · 2013年12月31日

部分遮挡下人脸人耳融合识别方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

压缩感知域高光谱图像稀疏解混方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

众核体系架构并行计算模型与算法自适应调优框架研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于子空间变换的机载宽带杂波抑制与扩展目标检测方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员