From an image of a person, we can easily infer the natural 3D pose and shape of the person even if ambiguity exists. This is because we have a mental model that allows us to imagine a person's appearance at different viewing directions from a given image and utilize the consistency between them for inference. However, existing human mesh recovery methods only consider the direction in which the image was taken due to their structural limitations. Hence, we propose "Implicit 3D Human Mesh Recovery (ImpHMR)" that can implicitly imagine a person in 3D space at the feature-level via Neural Feature Fields. In ImpHMR, feature fields are generated by CNN-based image encoder for a given image. Then, the 2D feature map is volume-rendered from the feature field for a given viewing direction, and the pose and shape parameters are regressed from the feature. To utilize consistency with pose and shape from unseen-view, if there are 3D labels, the model predicts results including the silhouette from an arbitrary direction and makes it equal to the rotated ground-truth. In the case of only 2D labels, we perform self-supervised learning through the constraint that the pose and shape parameters inferred from different directions should be the same. Extensive evaluations show the efficacy of the proposed method.
翻译:从一张人物图像中,即使存在模糊性,我们也能轻易推断出该人物自然的三维姿态与形状。这是因为我们拥有一个心智模型,能够基于给定图像想象人物在不同视角下的外观,并利用这些视角间的一致性进行推理。然而,现有的人体网格恢复方法受限于其结构,仅能考虑图像拍摄方向。为此,我们提出"隐式三维人体网格恢复(ImpHMR)"方法,该方法通过神经特征场在特征层面隐式地想象三维空间中的人物。在ImpHMR中,基于CNN的图像编码器为给定图像生成特征场。随后,从该特征场对给定视角方向进行体素渲染得到二维特征图,并从中回归姿态与形状参数。为利用未见视角下的姿态与形状一致性,当存在三维标签时,模型会预测任意方向下的包括轮廓在内的结果,并将其与旋转后的真实值对齐。当仅有两维标签时,我们通过约束不同视角推断出的姿态与形状参数必须相同的自监督学习策略进行训练。大量实验验证了该方法的有效性。