Extracting multiscale contextual information and higher-order correlations among skeleton sequences using Graph Convolutional Networks (GCNs) alone is inadequate for effective action classification. Hypergraph convolution addresses the above issues but cannot harness the long-range dependencies. Transformer proves to be effective in capturing these dependencies and making complex contextual features accessible. We propose an Autoregressive Adaptive HyperGraph Transformer (AutoregAd-HGformer) model for in-phase (autoregressive and discrete) and out-phase (adaptive) hypergraph generation. The vector quantized in-phase hypergraph equipped with powerful autoregressive learned priors produces a more robust and informative representation suitable for hyperedge formation. The out-phase hypergraph generator provides a model-agnostic hyperedge learning technique to align the attributes with input skeleton embedding. The hybrid (supervised and unsupervised) learning in AutoregAd-HGformer explores the action-dependent feature along spatial, temporal, and channel dimensions. The extensive experimental results and ablation study indicate the superiority of our model over state-of-the-art hypergraph architectures on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
翻译:仅使用图卷积网络(GCN)提取骨架序列的多尺度上下文信息和高阶相关性,对于有效的动作分类而言是不充分的。超图卷积解决了上述问题,但无法利用长程依赖关系。Transformer被证明在捕获这些依赖关系并使复杂的上下文特征可访问方面是有效的。我们提出了一种自回归自适应超图Transformer(AutoregAd-HGformer)模型,用于同相位(自回归和离散)和异相位(自适应)超图生成。配备强大自回归学习先验的矢量量化同相位超图,能产生更鲁棒且信息量更大的表示,适用于超边形成。异相位超图生成器提供了一种与模型无关的超边学习技术,以将属性与输入骨架嵌入对齐。AutoregAd-HGformer中的混合(监督和无监督)学习探索了沿空间、时间和通道维度的动作相关特征。大量的实验结果和消融研究表明,我们的模型在NTU RGB+D、NTU RGB+D 120和NW-UCLA数据集上优于最先进的超图架构。