This paper presents Ske2Grid, a new representation learning framework for improved skeleton-based action recognition. In Ske2Grid, we define a regular convolution operation upon a novel grid representation of human skeleton, which is a compact image-like grid patch constructed and learned through three novel designs. Specifically, we propose a graph-node index transform (GIT) to construct a regular grid patch through assigning the nodes in the skeleton graph one by one to the desired grid cells. To ensure that GIT is a bijection and enrich the expressiveness of the grid representation, an up-sampling transform (UPT) is learned to interpolate the skeleton graph nodes for filling the grid patch to the full. To resolve the problem when the one-step UPT is aggressive and further exploit the representation capability of the grid patch with increasing spatial size, a progressive learning strategy (PLS) is proposed which decouples the UPT into multiple steps and aligns them to multiple paired GITs through a compact cascaded design learned progressively. We construct networks upon prevailing graph convolution networks and conduct experiments on six mainstream skeleton-based action recognition datasets. Experiments show that our Ske2Grid significantly outperforms existing GCN-based solutions under different benchmark settings, without bells and whistles. Code and models are available at https://github.com/OSVAI/Ske2Grid
翻译:本文提出Ske2Grid,一种改进的基于骨架动作识别的新型表示学习框架。在Ske2Grid中,我们针对人体骨架的新型网格表示定义了常规卷积操作,该网格表示是一种紧凑的类图像网格补丁,通过三种创新设计构建与学习。具体而言,我们提出图节点索引变换(GIT),通过将骨架图中的节点逐一分配到目标网格单元,构建规则的网格补丁。为确保GIT为双射并增强网格表示的表达能力,我们学习上采样变换(UPT)对骨架图节点进行插值,以完全填充网格补丁。为解决单步UPT过于激进的问题,并进一步挖掘空间尺寸递增下网格补丁的表征能力,我们提出渐进式学习策略(PLS),将UPT解耦为多步操作,并通过渐进学习的紧凑级联设计将其与多个配对GIT对齐。我们在主流图卷积网络基础上构建模型,并在六个主流基于骨架的动作识别数据集上进行实验。结果表明,我们的Ske2Grid在无需额外技巧的情况下,显著优于现有基于GCN的解决方案,且在不同基准设置下均表现优异。代码与模型已开源:https://github.com/OSVAI/Ske2Grid