High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video samples are continuously required for traditional data-driven training. However, samples are not always sufficient in real-world scenarios, promoting few-shot action recognition (FSAR) research. We observe that most recent FSAR works build spatio-temporal relation of video samples via temporal alignment after spatial feature extraction, cutting apart spatial and temporal features within samples. They also capture motion information via narrow perspectives between adjacent frames without considering density, leading to insufficient motion information capturing. Therefore, we propose a novel plug-and-play architecture for FSAR called Spatio-tempOral frAme tuPle enhancer (SOAP) in this paper. The model we designed with such architecture refers to SOAP-Net. Temporal connections between different feature channels and spatio-temporal relation of features are considered instead of simple feature extraction. Comprehensive motion information is also captured, using frame tuples with multiple frames containing more motion information than adjacent frames. Combining frame tuples of diverse frame counts further provides a broader perspective. SOAP-Net achieves new state-of-the-art performance across well-known benchmarks such as SthSthV2, Kinetics, UCF101, and HMDB51. Extensive empirical evaluations underscore the competitiveness, pluggability, generalization, and robustness of SOAP. The code is released at https://github.com/wenbohuang1002/SOAP.
翻译:高帧率动作识别视频在提升细粒度表达的同时降低了时空关系与运动信息密度,因而传统数据驱动训练仍需持续依赖大量视频样本。然而现实场景中样本往往不足,这推动了少样本动作识别研究的发展。我们观察到,近期多数少样本动作识别研究通过空间特征提取后的时序对齐构建视频样本的时空关系,割裂了样本内部的空间与时间特征。这些方法还通过相邻帧间的狭窄视角捕获运动信息,未考虑信息密度,导致运动信息捕获不足。为此,本文提出一种新颖的即插即用式少样本动作识别架构——时空帧元组增强器。基于该架构设计的模型称为SOAP-Net。该模型不仅考虑不同特征通道间的时间关联与特征的时空关系,还通过包含比相邻帧更丰富运动信息的多帧帧元组捕获全面运动信息。结合不同帧数的帧元组进一步提供了更广阔的视角。SOAP-Net在SthSthV2、Kinetics、UCF101和HMDB51等知名基准测试中取得了最先进的性能。大量实证评估证明了SOAP的竞争力、可插拔性、泛化能力与鲁棒性。代码已发布于https://github.com/wenbohuang1002/SOAP。