Video sequences exhibit significant nuisance variations (undesired effects) of speed of actions, temporal locations, and subjects' poses, leading to temporal-viewpoint misalignment when comparing two sets of frames or evaluating the similarity of two sequences. Thus, we propose Joint tEmporal and cAmera viewpoiNt alIgnmEnt (JEANIE) for sequence pairs. In particular, we focus on 3D skeleton sequences whose camera and subjects' poses can be easily manipulated in 3D. We evaluate JEANIE on skeletal Few-shot Action Recognition (FSAR), where matching well temporal blocks (temporal chunks that make up a sequence) of support-query sequence pairs (by factoring out nuisance variations) is essential due to limited samples of novel classes. Given a query sequence, we create its several views by simulating several camera locations. For a support sequence, we match it with view-simulated query sequences, as in the popular Dynamic Time Warping (DTW). Specifically, each support temporal block can be matched to the query temporal block with the same or adjacent (next) temporal index, and adjacent camera views to achieve joint local temporal-viewpoint warping. JEANIE selects the smallest distance among matching paths with different temporal-viewpoint warping patterns, an advantage over DTW which only performs temporal alignment. We also propose an unsupervised FSAR akin to clustering of sequences with JEANIE as a distance measure. JEANIE achieves state-of-the-art results on NTU-60, NTU-120, Kinetics-skeleton and UWA3D Multiview Activity II on supervised and unsupervised FSAR, and their meta-learning inspired fusion.
翻译:摘要:视频序列存在显著的动作速度、时间位置及主体姿态的干扰变化(非期望效应),导致在比较两组帧序列或评估两段序列相似性时产生时空视角错位。为此,我们提出联合时间与视角对齐方法(JEANIE)用于序列对匹配,重点针对可通过三维空间操作相机与主体姿态的3D骨架序列。在骨架式小样本动作识别任务中,由于新类样本有限,对支撑-查询序列对进行有效的时间块匹配(消除干扰变化)至关重要。我们在此场景下评估JEANIE性能:首先为查询序列模拟多视角相机位置生成多个视图,随后采用类似动态时间规整的方式,将支撑序列与多视图查询序列进行匹配。具体而言,每个支撑时间块可与相同或相邻时间索引的查询时间块及相邻相机视角进行匹配,实现联合局部时空视角规整。与仅进行时间对齐的DTW相比,JEANIE能选择不同时空视角规整路径中的最小距离作为匹配结果。我们还提出基于JEANIE距离度量的无监督小样本动作识别方法(类似序列聚类)。在监督/无监督小样本动作识别及其元学习融合任务中,JEANIE在NTU-60、NTU-120、Kinetics-skeleton和UWA3D多视角活动II数据集上均取得最优结果。