Evaluating Adversarial Robustness of Convolution-based Human Motion Prediction

Human motion prediction has achieved a brilliant performance with the help of CNNs, which facilitates human-machine cooperation. However, currently, there is no work evaluating the potential risk in human motion prediction when facing adversarial attacks, which may cause danger in real applications. The adversarial attack will face two problems against human motion prediction: 1. For naturalness, pose data is highly related to the physical dynamics of human skeletons where Lp norm constraints cannot constrain the adversarial example well; 2. Unlike the pixel value in images, pose data is diverse at scale because of the different acquisition equipment and the data processing, which makes it hard to set fixed parameters to perform attacks. To solve the problems above, we propose a new adversarial attack method that perturbs the input human motion sequence by maximizing the prediction error with physical constraints. Specifically, we introduce a novel adaptable scheme that facilitates the attack to suit the scale of the target pose and two physical constraints to enhance the imperceptibility of the adversarial example. The evaluating experiments on three datasets show that the prediction errors of all target models are enlarged significantly, which means current convolution-based human motion prediction models can be easily disturbed under the proposed attack. The quantitative analysis shows that prior knowledge and semantic information modeling can be the key to the adversarial robustness of human motion predictors. The qualitative results indicate that the adversarial sample is hard to be noticed when compared frame by frame but is relatively easy to be detected when the sample is animated.

翻译：人体运动预测借助CNN取得了显著成效，促进了人机协作。然而，目前尚无研究评估人体运动预测在面临对抗攻击时的潜在风险，这可能在真实应用中引发危险。针对人体运动预测，对抗攻击将面临两个问题：1. 对于自然性而言，姿态数据与人体骨骼的物理动力学高度相关，此时Lp范数约束难以有效限制对抗样本；2. 与图像的像素值不同，由于采集设备和数据处理方式的差异，姿态数据在尺度上具有多样性，这导致难以设置固定参数进行攻击。为解决上述问题，我们提出了一种新的对抗攻击方法，通过最大化带有物理约束的预测误差来扰动输入的人体运动序列。具体而言，我们引入了一种新颖的自适应方案，使攻击能适应目标姿态的尺度，并引入两个物理约束增强对抗样本的不可感知性。在三个数据集上的评估实验表明，所有目标模型的预测误差均显著增大，这意味着当前基于卷积的人体运动预测模型在所提攻击下极易被干扰。定量分析显示，先验知识与语义信息建模可能是人体运动预测器对抗鲁棒性的关键因素。定性结果表明，逐帧对比时对抗样本难以察觉，但将样本动画化后则相对容易被检测。