POCO: 3D Pose and Shape Estimation with Confidence

The regression of 3D Human Pose and Shape (HPS) from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the confidence of their outputs, meaning that downstream tasks cannot differentiate accurate estimates from inaccurate ones. To address this, we develop POCO, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass. Specifically, POCO estimates both the 3D body pose and a per-sample variance. The key idea is to introduce a Dual Conditioning Strategy (DCS) for regressing uncertainty that is highly correlated to pose reconstruction quality. The POCO framework can be applied to any HPS regressor and here we evaluate it by modifying HMR, PARE, and CLIFF. In all cases, training the network to reason about uncertainty helps it learn to more accurately estimate 3D pose. While this was not our goal, the improvement is modest but consistent. Our main motivation is to provide uncertainty estimates for downstream tasks; we demonstrate this in two ways: (1) We use the confidence estimates to bootstrap HPS training. Given unlabelled image data, we take the confident estimates of a POCO-trained regressor as pseudo ground truth. Retraining with this automatically-curated data improves accuracy. (2) We exploit uncertainty in video pose estimation by automatically identifying uncertain frames (e.g. due to occlusion) and inpainting these from confident frames. Code and models will be available for research at https://poco.is.tue.mpg.de.

翻译：从单张图像回归三维人体姿态与形状（HPS）的精度正日益提高，这使结果对动作识别或三维图形等下游任务具有实用价值。然而，任何回归器都并非完美，图像证据模糊或训练中未出现的姿态与外观均可能影响精度。但当前多数HPS回归器未能报告其输出结果的置信度，导致下游任务无法区分准确与不准确的估计。为解决此问题，我们提出POCO——一种新型框架，用于训练HPS回归器在单次前向传播中不仅估计三维人体，同时估计其置信度。具体而言，POCO同时估计三维人体姿态与每个样本的方差。其核心思想是引入双重条件化策略（DCS），用于回归与姿态重建质量高度相关的不确定性。POCO框架可适用于任意HPS回归器，本文通过改进HMR、PARE和CLIFF对其进行评估。在所有情况下，训练网络对不确定性进行推理均有助于其更准确地学习估计三维姿态。尽管这并非本工作目标，但改进效果虽温和却一致。我们的主要动机是为下游任务提供不确定性估计：我们通过两种方式验证：（1）利用置信度估计引导HPS训练——对无标注图像数据，将POCO训练回归器的高置信度估计作为伪真值，据此自动筛选的数据重新训练可提升精度；（2）在视频姿态估计中利用不确定性——自动识别因遮挡等引起的不确定帧，并从置信帧中对其进行修复。代码与模型将在https://poco.is.tue.mpg.de开放研究使用。