Recently, skeleton-based human action has become a hot research topic because the compact representation of human skeletons brings new blood to this research domain. As a result, researchers began to notice the importance of using RGB or other sensors to analyze human action by extracting skeleton information. Leveraging the rapid development of deep learning (DL), a significant number of skeleton-based human action approaches have been presented with fine-designed DL structures recently. However, a well-trained DL model always demands high-quality and sufficient data, which is hard to obtain without costing high expenses and human labor. In this paper, we introduce a novel data augmentation method for skeleton-based action recognition tasks, which can effectively generate high-quality and diverse sequential actions. In order to obtain natural and realistic action sequences, we propose denoising diffusion probabilistic models (DDPMs) that can generate a series of synthetic action sequences, and their generation process is precisely guided by a spatial-temporal transformer (ST-Trans). Experimental results show that our method outperforms the state-of-the-art (SOTA) motion generation approaches on different naturality and diversity metrics. It proves that its high-quality synthetic data can also be effectively deployed to existing action recognition models with significant performance improvement.
翻译:近年来,基于骨骼的人体动作成为研究热点,因为人体骨骼的紧凑表示为该领域注入了新活力。研究人员开始认识到利用RGB或其他传感器通过提取骨骼信息来分析人体动作的重要性。随着深度学习(DL)的快速发展,近期大量采用精巧设计的深度学习结构的骨骼动作识别方法被提出。然而,训练良好的深度学习模型始终需要高质量且充足的数据,这在缺乏高昂成本投入和人力劳动的情况下难以获取。本文针对骨骼动作识别任务提出一种新型数据增强方法,能够有效生成高质量且多样化的序列动作。为获得自然真实的动作序列,我们引入去噪扩散概率模型(DDPMs)来生成一系列合成动作序列,并通过时空Transformer(ST-Trans)精确引导其生成过程。实验结果表明,我们的方法在不同自然度与多样性指标上均优于现有最优(SOTA)运动生成方法,证明了其高质量合成数据可有效部署于现有动作识别模型,并显著提升性能。