From an image of a person, we can easily infer the natural 3D pose and shape of the person even if ambiguity exists. This is because we have a mental model that allows us to imagine a person's appearance at different viewing directions from a given image and utilize the consistency between them for inference. However, existing human mesh recovery methods only consider the direction in which the image was taken due to their structural limitations. Hence, we propose "Implicit 3D Human Mesh Recovery (ImpHMR)" that can implicitly imagine a person in 3D space at the feature-level via Neural Feature Fields. In ImpHMR, feature fields are generated by CNN-based image encoder for a given image. Then, the 2D feature map is volume-rendered from the feature field for a given viewing direction, and the pose and shape parameters are regressed from the feature. To utilize consistency with pose and shape from unseen-view, if there are 3D labels, the model predicts results including the silhouette from an arbitrary direction and makes it equal to the rotated ground-truth. In the case of only 2D labels, we perform self-supervised learning through the constraint that the pose and shape parameters inferred from different directions should be the same. Extensive evaluations show the efficacy of the proposed method.
翻译:从一张人物图像中,即使存在歧义,我们也能轻松推断出其自然的3D姿态和形状。这是因为我们拥有一种心理模型,能够基于给定图像想象人物在不同视角下的外观,并利用它们之间的一致性进行推断。然而,现有的人体网格重建方法因结构限制,仅考虑图像拍摄时的视角。因此,我们提出"隐式3D人体网格重建(ImpHMR)",该方法能够通过神经特征场在特征层面隐式地想象3D空间中的人物。在ImpHMR中,特征场由基于CNN的图像编码器针对给定图像生成。随后,从特征场中针对给定视角方向进行体渲染得到2D特征图,并从中回归出姿态和形状参数。为了利用未见视角下的姿态与形状一致性,当存在3D标签时,模型预测包括任意方向剪影在内的结果,并将其与经旋转的真实值对齐。当仅有2D标签时,我们通过约束不同方向推断的姿态和形状参数应保持一致来进行自监督学习。大量评估证明了所提方法的有效性。