Few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action. Since no clear annotation is provided for the global prototypes, we use a group of focused prototypes to focus on certain timestamps in the video. We compare video similarity by matching the compound prototypes between the support and query videos. The global prototypes are directly matched to compare videos from the same perspective, for example, to compare whether two actions start similarly. For the focused prototypes, since actions have various temporal variations in the videos, we apply bipartite matching to allow the comparison of actions with different temporal positions and shifts. Experiments demonstrate that our proposed method achieves state-of-the-art results on multiple benchmarks.
翻译:小样本动作识别旨在仅使用少量带标签训练样本识别新动作类别。本文提出一种新颖方法,首先将每个视频归纳为复合原型,该原型包含一组全局原型和一组聚焦原型,随后基于这些原型进行视频相似度比较。每个全局原型旨在从整个视频中总结特定维度信息,例如动作的起始/演变过程。由于全局原型缺乏明确标注,我们使用一组聚焦原型来关注视频中的特定时间戳。通过匹配支持视频与查询视频的复合原型进行视频相似度比较:全局原型采用直接匹配方式,以相同视角比较视频,例如比较两个动作的起始是否相似;针对聚焦原型,由于动作在视频中存在时间变化差异,我们采用二分图匹配方法,允许比较具有不同时间位置和偏移的动作。实验表明,所提方法在多个基准测试中均达到最优性能。