Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the DevLSTM module, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. The path development, originated from Rough path theory, can effectively capture the order of events in high-dimensional stream data with massive dimension reduction and consequently enhance the LSTM module substantially. Our proposed G-DevLSTM module can be conveniently plugged into the temporal graph, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed hybrid model significantly outperforms the current best-performing methods in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.
翻译:视频中的骨骼动作识别(SAR)是计算机视觉领域一项重要且具有挑战性的任务。当前最先进的SAR模型主要基于图卷积神经网络(GCN),这类模型在提取骨骼数据的空间信息方面表现出色。然而,目前尚不清楚这类基于GCN的模型能否有效捕捉人体动作序列的时间动态特征。为此,我们提出DevLSTM模块,该模块通过利用李群结构,采用路径发展方法——一种对序列数据进行原则性且简洁表示的方法。路径发展源于粗糙路径理论,能够在大幅降维的同时有效捕捉高维流数据中事件的时间顺序,从而显著增强LSTM模块的性能。我们提出的G-DevLSTM模块可便捷地嵌入时间图结构,与现有先进GCN模型形成互补。在NTU60、NTU120和Chalearn2013数据集上的实证研究表明,我们提出的混合模型在SAR任务中显著优于当前最佳方法。相关代码已开源在https://github.com/DeepIntoStreams/GCN-DevLSTM。