Zero-shot skeleton action recognition is a non-trivial task that requires robust unseen generalization with prior knowledge from only seen classes and shared semantics. Existing methods typically build the skeleton-semantics interactions by uncontrollable mappings and conspicuous representations, thereby can hardly capture the intricate and fine-grained relationship for effective cross-modal transferability. To address these issues, we propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework with the guidance of cOntext-aware side informatioN (dubbed Neuron), to explore more fine-grained cross-modal correspondence from micro to macro perspectives at both spatial and temporal levels, respectively. Concretely, 1) we first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information to capture the intricate and synergistic skeleton-semantic correlations step-by-step, progressively refining cross-model alignment; and 2) we introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes, enabling them to absorb structure-related spatial representations and regularity-dependent temporal patterns. Notably, such processes are analogous to the learning and growth of neurons, equipping the framework with the capacity to generalize to novel unseen action categories. Extensive experiments on various benchmark datasets demonstrated the superiority of the proposed method.
翻译:零样本骨架动作识别是一项非平凡任务,需要仅基于可见类别的先验知识与共享语义实现鲁棒的未见类别泛化。现有方法通常通过不可控的映射和显式表示构建骨架-语义交互,因而难以捕捉精细且细粒度的关系以实现有效的跨模态可迁移性。为解决这些问题,我们提出一种新颖的**动态演化双流骨架-语义协同框架**(命名为Neuron),在上下文感知辅助信息的引导下,分别从空间和时间层面探索从微观到宏观的更细粒度跨模态对应关系。具体而言:1)我们首先构建时空演化微原型,并整合动态上下文感知辅助信息以逐步捕捉精细的骨架-语义协同关联,渐进式优化跨模态对齐;2)引入空间压缩与时间记忆机制指导时空微原型的生长,使其能够吸收结构相关的空间表示与规律依赖的时间模式。值得注意的是,该过程类似于神经元的学习与生长过程,使框架具备泛化至未见动作类别的能力。在多个基准数据集上的大量实验验证了所提方法的优越性。