Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109x reduction in time complexity, on-hardware accelerations of 26x, and reductions in maximum allocated memory of 52% during online inference.
翻译:基于骨架数据的图推理已成为人体动作识别的一种有前景的方法。然而,现有基于图的方法大多采用完整时间序列作为输入,在在线推理场景下存在显著的计算冗余。本文通过将时空图卷积神经网络重构为持续推理网络来解决这一问题,该网络能够在不重复处理帧的情况下实现逐时间步预测。为评估该方法,我们构建了持续版本的ST-GCN(CoST-GCN),以及两种采用不同自注意力机制的衍生方法CoAGCN和CoS-TR。我们研究了用于推理加速的权重迁移策略和架构改进,并在NTU RGB+D 60、NTU RGB+D 120和Kinetics Skeleton 400数据集上进行了实验。在保持相近预测精度的前提下,我们观察到时间复杂度的降低高达109倍,硬件加速比达26倍,在线推理期间最大内存分配减少52%。