Adversarial attacks on skeletal human action recognition have received significant attention. However, existing methods typically introduce noise-like perturbations that degrade motion quality post-attack, and thereby are inherently perceptible with recent advancements in S-HAR systems. We discover that this degradation stems from the gap between empirical and true risks during the optimization process of previous adversarial attacks. To address this issue, we propose an attack where adversarial motions are obtained without compromising their motion quality. To minimize the risk gap and preserve motion quality, we propose a distribution-based adversarial attack method without introducing noise-like perturbations. To faithfully evaluate the motion quality, we propose a new metric that aligns with human perception on real-world naturalness. Experiments have been conducted on the state-of-the-art S-HAR methods across two datasets, demonstrating the superiority of our method in both the attack success rate and the post-attack motion quality through qualitative and quantitative analyses. The success of our quality-preserving attack application and distribution-based method raises serious concerns about the robustness of action recognizers, highlighting the need for further enhancements in this domain.
翻译:针对骨架人体动作识别的对抗攻击已受到广泛关注。然而,现有方法通常引入类似噪声的扰动,导致攻击后动作质量下降,因而在当今先进的S-HAR系统中天然具有可感知性。我们发现,这种质量退化源于先前对抗攻击优化过程中经验风险与真实风险之间的差距。为解决该问题,我们提出一种在获取对抗动作的同时不损害动作质量的攻击方法。为最小化风险差距并保持动作质量,我们提出了一种基于分布的对抗攻击方法,避免引入类似噪声的扰动。为真实评估动作质量,我们提出一种与人类对真实世界自然感知一致的新指标。在两个数据集上对最先进的S-HAR方法进行了实验,通过定性和定量分析证明,本方法在攻击成功率及攻击后动作质量方面均具有优越性。本质量保持攻击应用与基于分布方法取得的成功,对动作识别器的鲁棒性提出严重质疑,凸显了该领域进一步增强的必要性。