PyMAF-X：从单目图像实现良好对齐的全身模型回归 (PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images)

from arxiv, Article in IEEE TPAMI 2023, Update project page: https://zhanghongwen.cn/pymaf-x, An eXpressive extension of PyMAF [arXiv:2103.16507] for monocular human/hand/face/whole-body motion capture

We present PyMAF-X, a regression-based approach to recovering parametric full-body models from monocular images. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations into the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body, hand, face, and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results. The project page with code and video results can be found at https://zhanghongwen.cn/pymaf-x.

翻译：本文提出PyMAF-X，一种基于回归的方法，用于从单目图像恢复参数化全身模型。该任务极具挑战性，因为微小的参数偏差可能导致估计的网格与输入图像之间出现明显的错位。此外，在将部位特定估计集成到全身模型中时，现有方法往往会导致对齐质量下降或产生不自然的手腕姿态。为解决这些问题，我们在回归网络中提出了一种金字塔网格对齐反馈（PyMAF）循环，以实现良好对齐的人体网格恢复，并将其扩展为PyMAF-X以恢复具有表现力的全身模型。PyMAF的核心思想是利用特征金字塔，并根据网格-图像对齐状态显式修正预测参数。具体而言，给定当前预测的参数，将从更精细分辨率的特征中提取网格对齐证据，并反馈用于参数修正。为增强对齐感知能力，我们采用辅助的密集监督来提供网格-图像对应关系指导，同时引入空间对齐注意力机制，使网络能够感知全局上下文信息。在将PyMAF扩展用于全身网格恢复时，PyMAF-X提出了一种自适应集成策略，以在保持部位特定估计良好对齐性能的同时，生成自然的手腕姿态。我们在多个基准数据集上验证了所提方法在身体、手部、面部及全身网格恢复任务中的有效性，PyMAF与PyMAF-X显著改善了网格-图像对齐效果，并取得了新的最优性能。项目页面（含代码与视频结果）详见 https://zhanghongwen.cn/pymaf-x。