Recently, skeleton-based human action has become a hot research topic because the compact representation of human skeletons brings new blood to this research domain. As a result, researchers began to notice the importance of using RGB or other sensors to analyze human action by extracting skeleton information. Leveraging the rapid development of deep learning (DL), a significant number of skeleton-based human action approaches have been presented with fine-designed DL structures recently. However, a well-trained DL model always demands high-quality and sufficient data, which is hard to obtain without costing high expenses and human labor. In this paper, we introduce a novel data augmentation method for skeleton-based action recognition tasks, which can effectively generate high-quality and diverse sequential actions. In order to obtain natural and realistic action sequences, we propose denoising diffusion probabilistic models (DDPMs) that can generate a series of synthetic action sequences, and their generation process is precisely guided by a spatial-temporal transformer (ST-Trans). Experimental results show that our method outperforms the state-of-the-art (SOTA) motion generation approaches on different naturality and diversity metrics. It proves that its high-quality synthetic data can also be effectively deployed to existing action recognition models with significant performance improvement.
翻译:近年来,骨架人体动作因其紧凑的表示方式为研究领域注入新活力而成为热点研究方向。研究者开始认识到通过提取骨架信息,利用RGB或其他传感器分析人体动作的重要性。借助深度学习(DL)的快速发展,近期大量基于骨架的人体动作方法采用精心设计的DL结构被提出。然而,训练良好的DL模型始终需要高质量且充足的数据,而这往往需要高昂成本与人力投入。本文针对骨架动作识别任务提出一种新型数据增强方法,可有效生成高质量且多样化的时序动作。为获得自然逼真的动作序列,我们提出去噪扩散概率模型(DDPMs),该模型能生成一系列合成动作序列,其生成过程由时空Transformer(ST-Trans)精确引导。实验结果表明,本方法在自然性与多样性指标上均优于当前最先进的(SOTA)运动生成方法。同时证明,其高质量合成数据可有效部署至现有动作识别模型,并带来显著的性能提升。