Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body. The naive focal length assumptions can harm this task with the incorrectly formulated projection matrices. To solve this, we propose Zolly, the first 3DHMR method focusing on perspective-distorted images. Our approach begins with analysing the reason for perspective distortion, which we find is mainly caused by the relative location of the human body to the camera center. We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body. We then estimate the distance from distortion scale features rather than environment context features. Afterwards, we integrate the distortion feature with image features to reconstruct the body mesh. To formulate the correct projection matrix and locate the human body position, we simultaneously use perspective and weak-perspective projection loss. Since existing datasets could not handle this task, we propose the first synthetic dataset PDHuman and extend two real-world datasets tailored for this task, all containing perspective-distorted human images. Extensive experiments show that Zolly outperforms existing state-of-the-art methods on both perspective-distorted datasets and the standard benchmark (3DPW).

翻译：由于难以在自然场景中对单视图RGB图像进行标定，现有三维人体网格重建方法要么使用恒定的长焦距，要么基于背景环境上下文估计焦距，这无法解决当相机靠近人体时，由透视相机投影导致的躯干、四肢、手部或面部畸变问题。错误的投影矩阵假设会因不正确的公式化投影矩阵而损害该任务。为此，我们提出Zolly——首个聚焦于透视畸变图像的三维人体网格重建方法。我们的方法首先分析透视畸变的成因，发现其主要由人体相对于相机中心的相对位置引起。我们提出一种新的相机模型和一种新颖的二维表示——称为畸变图像，用于描述人体二维密集畸变尺度。随后，我们通过畸变尺度特征而非环境上下文特征估计距离。之后，将畸变特征与图像特征融合以重建人体网格。为构建正确的投影矩阵并定位人体位置，我们同时使用透视投影损失和弱透视投影损失。由于现有数据集无法处理该任务，我们提出首个合成数据集PDHuman，并针对该任务扩展了两个真实世界数据集，所有数据集均包含透视畸变人体图像。大量实验表明，Zolly在透视畸变数据集和标准基准（3DPW）上均优于现有最先进方法。