This paper presents an inverse kinematic optimization layer (IKOL) for 3D human pose and shape estimation that leverages the strength of both optimization- and regression-based methods within an end-to-end framework. IKOL involves a nonconvex optimization that establishes an implicit mapping from an image's 3D keypoints and body shapes to the relative body-part rotations. The 3D keypoints and the body shapes are the inputs and the relative body-part rotations are the solutions. However, this procedure is implicit and hard to make differentiable. So, to overcome this issue, we designed a Gauss-Newton differentiation (GN-Diff) procedure to differentiate IKOL. GN-Diff iteratively linearizes the nonconvex objective function to obtain Gauss-Newton directions with closed form solutions. Then, an automatic differentiation procedure is directly applied to generate a Jacobian matrix for end-to-end training. Notably, the GN-Diff procedure works fast because it does not rely on a time-consuming implicit differentiation procedure. The twist rotation and shape parameters are learned from the neural networks and, as a result, IKOL has a much lower computational overhead than most existing optimization-based methods. Additionally, compared to existing regression-based methods, IKOL provides a more accurate mesh-image correspondence. This is because it iteratively reduces the distance between the keypoints and also enhances the reliability of the pose structures. Extensive experiments demonstrate the superiority of our proposed framework over a wide range of 3D human pose and shape estimation methods.
翻译:本文提出一种逆运动学优化层(IKOL),用于三维人体姿态与形状估计,在端到端框架内融合了基于优化与回归方法的优势。IKOL涉及一个非凸优化过程,该过程建立从图像的三维关键点和体型到相对身体部位旋转的隐式映射:三维关键点与体型为输入,相对身体部位旋转为求解目标。然而,该过程具有隐式特性且难以实现可微化。为解决此问题,我们设计了高斯-牛顿微分法(GN-Diff)来对IKOL进行微分计算。GN-Diff通过迭代线性化非凸目标函数,获得具有闭式解的高斯-牛顿方向,随后直接应用自动微分过程生成雅可比矩阵以支持端到端训练。值得注意的是,GN-Diff过程运行快速,因其不依赖耗时的隐式微分方法。扭旋旋转与形状参数由神经网络学习得到,这使得IKOL相比大多数现有基于优化的方法具有显著更低的计算开销。此外,与现有基于回归的方法相比,IKOL能提供更精确的网格-图像对应关系——这得益于其通过迭代缩小关键点间距并增强姿态结构的可靠性。大量实验证明,我们的框架在多种三维人体姿态与形状估计方法中具有显著优越性。