Human pose and shape (HPS) estimation methods achieve remarkable results. However, current HPS benchmarks are mostly designed to test models in scenarios that are similar to the training data. This can lead to critical situations in real-world applications when the observed data differs significantly from the training data and hence is out-of-distribution (OOD). It is therefore important to test and improve the OOD robustness of HPS methods. To address this fundamental problem, we develop a simulator that can be controlled in a fine-grained manner using interpretable parameters to explore the manifold of images of human pose, e.g. by varying poses, shapes, and clothes. We introduce a learning-based testing method, termed PoseExaminer, that automatically diagnoses HPS algorithms by searching over the parameter space of human pose images to find the failure modes. Our strategy for exploring this high-dimensional parameter space is a multi-agent reinforcement learning system, in which the agents collaborate to explore different parts of the parameter space. We show that our PoseExaminer discovers a variety of limitations in current state-of-the-art models that are relevant in real-world scenarios but are missed by current benchmarks. For example, it finds large regions of realistic human poses that are not predicted correctly, as well as reduced performance for humans with skinny and corpulent body shapes. In addition, we show that fine-tuning HPS methods by exploiting the failure modes found by PoseExaminer improve their robustness and even their performance on standard benchmarks by a significant margin. The code are available for research purposes.
翻译:摘要:人体姿态与形状(HPS)估计方法已取得显著成果。然而,当前HPS基准测试主要设计用于测试模型在与训练数据相似的场景下的表现。当观测数据与训练数据存在显著差异(即分布外分布,OOD)时,这可能导致实际应用中出现关键问题。因此,测试并提升HPS方法的OOD鲁棒性至关重要。为解决这一根本问题,我们开发了一个可通过可解释参数进行细粒度控制的模拟器,用于探索人体姿态图像流形(例如通过改变姿态、形状和衣着)。我们提出一种基于学习的测试方法,名为PoseExaminer,通过搜索人体姿态图像的参数空间自动诊断HPS算法,从而发现其失效模式。探索该高维参数空间的策略采用多智能体强化学习系统,其中多个智能体协同探索参数空间的不同区域。实验表明,我们的PoseExaminer发现了当前最先进模型中的多种局限,这些局限在实际场景中至关重要,却被现有基准测试所忽略。例如,它发现大片未正确预测的真实人体姿态区域,以及针对瘦弱和肥胖体型时的性能下降。此外,利用PoseExaminer发现的失效模式对HPS方法进行微调,不仅能显著提升其鲁棒性,甚至能改善其在标准基准测试上的性能。代码已开源供研究使用。