Quantum Human Action Recognition (HAR) is an interesting research area in human-computer interaction used to monitor the activities of elderly and disabled individuals affected by physical and mental health. In the recent era, skeleton-based HAR has received much attention because skeleton data has shown that it can handle changes in striking, body size, camera views, and complex backgrounds. One key characteristic of ST-GCN is automatically learning spatial and temporal patterns from skeleton sequences. It has some limitations, as this method only works for short-range correlation due to its limited receptive field. Consequently, understanding human action requires long-range interconnection. To address this issue, we developed a quantum spatial-temporal relative transformer ST-RTR model. The ST-RTR includes joint and relay nodes, which allow efficient communication and data transmission within the network. These nodes help to break the inherent spatial and temporal skeleton topologies, which enables the model to understand long-range human action better. Furthermore, we combine quantum ST-RTR with a fusion model for further performance improvements. To assess the performance of the quantum ST-RTR method, we conducted experiments on three skeleton-based HAR benchmarks: NTU RGB+D 60, NTU RGB+D 120, and UAV-Human. It boosted CS and CV by 2.11 % and 1.45% on NTU RGB+D 60, 1.25% and 1.05% on NTU RGB+D 120. On UAV-Human datasets, accuracy improved by 2.54%. The experimental outcomes explain that the proposed ST-RTR model significantly improves action recognition associated with the standard ST-GCN method.
翻译:量子人体动作识别是人机交互中一个有趣的研究领域,用于监测受身心健康影响的老年人和残疾人的活动。近年来,基于骨架的动作识别备受关注,因为骨架数据表明其能够处理姿态变化、体型差异、摄像机视角和复杂背景等问题。ST-GCN的一个关键特性是能够自动从骨架序列中学习空间和时间模式。但该方法存在一些局限性,由于其感受野有限,仅适用于短程相关性建模。然而,理解人体动作需要长程关联性。为解决这一问题,我们开发了一种量子时空相对Transformer模型ST-RTR。ST-RTR包含关节节点和中继节点,可实现网络内的高效通信与数据传输。这些节点有助于打破骨架固有的时空拓扑结构,使模型能够更好地理解长程人体动作。此外,我们将量子ST-RTR与融合模型结合以进一步提升性能。为评估量子ST-RTR方法的性能,我们在三个基于骨架的动作识别基准数据集上进行了实验:NTU RGB+D 60、NTU RGB+D 120和UAV-Human。在NTU RGB+D 60数据集上,CS和CV指标分别提升了2.11%和1.45%;在NTU RGB+D 120数据集上分别提升了1.25%和1.05%;在UAV-Human数据集上准确率提高了2.54%。实验结果表明,所提出的ST-RTR模型相较于标准ST-GCN方法显著提升了动作识别性能。