Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

A primary challenge faced in few-shot action recognition is inadequate video data for training. To address this issue, current methods in this field mainly focus on devising algorithms at the feature level while little attention is paid to processing input video data. Moreover, existing frame sampling strategies may omit critical action information in temporal and spatial dimensions, which further impacts video utilization efficiency. In this paper, we propose a novel video frame sampler for few-shot action recognition to address this issue, where task-specific spatial-temporal frame sampling is achieved via a temporal selector (TS) and a spatial amplifier (SA). Specifically, our sampler first scans the whole video at a small computational cost to obtain a global perception of video frames. The TS plays its role in selecting top-T frames that contribute most significantly and subsequently. The SA emphasizes the discriminative information of each frame by amplifying critical regions with the guidance of saliency maps. We further adopt task-adaptive learning to dynamically adjust the sampling strategy according to the episode task at hand. Both the implementations of TS and SA are differentiable for end-to-end optimization, facilitating seamless integration of our proposed sampler with most few-shot action recognition methods. Extensive experiments show a significant boost in the performances on various benchmarks including long-term videos.The code is available at https://github.com/R00Kie-Liu/Sampler

翻译：为解决这一问题,该领域目前的方法主要侧重于在功能层面设计算法,而很少注意处理输入视频数据。此外,现有的框架抽样战略可能省略时间和空间层面的关键行动信息,从而进一步影响视频利用效率。在本文件中,我们提议为少量行动识别提供一个新的视频框架取样器,以解决这一问题,即通过时间选择器(TS)和空间放大器(SA)实现特定任务的空间时间框架取样。具体地说,我们的取样器首先以小计算成本扫描整个视频,以获得对视频框架的全球感知。TS在选择对时间和空间层面贡献最大、从而进一步影响视频利用效率的顶端和空间层面框架方面发挥着作用。在本文中,我们建议为一些任务适应性学习以动态调整取样战略,以便根据当前的情况任务来完成。TS和SA的实施对于最终至端的优化来说都是不同的,因此,TS在选择最接近的T-T框架框架方面发挥着作用。 SA强调每个框架的歧视性信息,通过突出的地图指导来扩大关键区域。我们提出的许多次级样本/级的模拟测试中,包括现有的高级级级级的模拟测试。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日