Besides a 3D mesh, Human Mesh Recovery (HMR) methods usually need to estimate a camera for computing 2D reprojection loss. Previous approaches may encounter the following problem: both the mesh and camera are not correct but the combination of them can yield a low reprojection loss. To alleviate this problem, we define multiple RoIs (region of interest) containing the same human and propose a multiple-RoI-based HMR method. Our key idea is that with multiple RoIs as input, we can estimate multiple local cameras and have the opportunity to design and apply additional constraints between cameras to improve the accuracy of the cameras and, in turn, the accuracy of the corresponding 3D mesh. To implement this idea, we propose a RoI-aware feature fusion network by which we estimate a 3D mesh shared by all RoIs as well as local cameras corresponding to the RoIs. We observe that local cameras can be converted to the camera of the full image through which we construct a local camera consistency loss as the additional constraint imposed on local cameras. Another benefit of introducing multiple RoIs is that we can encapsulate our network into a contrastive learning framework and apply a contrastive loss to regularize the training of our network. Experiments demonstrate the effectiveness of our multi-RoI HMR method and superiority to recent prior arts. Our code is available at https://github.com/CptDiaos/Multi-RoI.
翻译:除了三维网格外,人体网格恢复方法通常需要估计相机参数以计算二维重投影损失。先前方法可能遇到以下问题:网格和相机参数均不准确,但二者的组合仍可能产生较低的重投影损失。为缓解此问题,我们定义了包含同一人体的多个感兴趣区域,并提出基于多感兴趣区域的人体网格恢复方法。我们的核心思想是:以多个感兴趣区域作为输入,可以估计多个局部相机参数,从而有机会设计并应用相机间的附加约束,以提高相机参数的准确性,进而提升对应三维网格的精度。为实现这一思路,我们提出了一种感兴趣区域感知的特征融合网络,通过该网络可估计所有感兴趣区域共享的三维网格及各区域对应的局部相机参数。我们观察到,局部相机参数可转换为完整图像的相机参数,据此构建了局部相机一致性损失作为对局部相机参数的附加约束。引入多感兴趣区域的另一优势在于,可将网络嵌入对比学习框架,并应用对比损失来规范网络训练。实验证明了我们多感兴趣区域人体网格恢复方法的有效性及其相对于近期先进方法的优越性。代码已开源:https://github.com/CptDiaos/Multi-RoI。