Monocular 3D human pose and shape estimation is an ill-posed problem since multiple 3D solutions can explain a 2D image of a subject. Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image. We show that these approaches exhibit a trade-off between three key properties: (i) accuracy - the likelihood of the ground-truth 3D solution under the predicted distribution, (ii) sample-input consistency - the extent to which 3D samples from the predicted distribution match the visible 2D image evidence, and (iii) sample diversity - the range of plausible 3D solutions modelled by the predicted distribution. Our method, HuManiFlow, predicts simultaneously accurate, consistent and diverse distributions. We use the human kinematic tree to factorise full body pose into ancestor-conditioned per-body-part pose distributions in an autoregressive manner. Per-body-part distributions are implemented using normalising flows that respect the manifold structure of SO(3), the Lie group of per-body-part poses. We show that ill-posed, but ubiquitous, 3D point estimate losses reduce sample diversity, and employ only probabilistic training losses. Code is available at: https://github.com/akashsengupta1997/HuManiFlow.
翻译:单目三维人体姿态与形状估计是一个病态问题,因为同一主体的二维图像可能对应多个三维解。近期方法基于图像条件预测合理三维姿态与形状参数的概率分布。我们证明这些方法在三个关键特性之间存在权衡:(i)准确性——预测分布下真实三维解的可能性,(ii)样本-输入一致性——预测分布采样的三维样本与可见二维图像证据的匹配程度,以及(iii)样本多样性——预测分布建模的合理三维解范围。本文方法HuManiFlow能够同时预测准确、一致且多样的分布。我们利用人体运动学树以自回归方式将全身姿态分解为基于祖先条件的各身体部位姿态分布。各部位分布通过尊重SO(3)流形结构(即各部位姿态的李群)的归一化流实现。研究表明,病态但广泛使用的三维点估计损失会降低样本多样性,因此我们仅采用概率训练损失。代码地址:https://github.com/akashsengupta1997/HuManiFlow。