In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency. These methods typically prioritize accuracy, resulting in large network sizes and excessive computational complexity, which may hinder their practical application in real-world scenarios, such as virtual reality systems. To address this issue, we introduce a modular multi-stage lightweight graph-based transformer network for human pose and shape estimation from 2D human pose, a pose-based human mesh reconstruction approach that prioritizes computational efficiency without sacrificing reconstruction accuracy. Our method consists of a 2D-to-3D lifter module that utilizes graph transformers to analyze structured and implicit joint correlations in 2D human poses, and a mesh regression module that combines the extracted pose features with a mesh template to produce the final human mesh parameters.
翻译:本研究旨在解决现有基于深度学习的人体网格重建方法在准确性与计算效率之间难以平衡的问题。这类方法通常优先保证准确性,导致网络规模庞大且计算复杂度过高,可能阻碍其在虚拟现实系统等实际场景中的落地应用。为此,我们提出一种模块化多阶段轻量级图Transformer网络,通过二维人体姿态进行人体姿态与形状估计——这是一种基于姿态的人体网格重建方法,在不牺牲重建精度的前提下优先考虑计算效率。本方法包含两个核心模块:二维到三维提升模块,利用图Transformer分析二维人体姿态中的结构化隐式关节点相关性;以及网格回归模块,将提取的姿态特征与网格模板相结合,最终输出人体网格参数。