We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR) that demonstrates state-of-the-art performance, with strong generalization and consistent accuracy in diverse in-the-wild conditions. 3DB estimates the human pose of the body, feet, and hands. It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape. 3DB employs an encoder-decoder architecture and supports auxiliary prompts, including 2D keypoints and masks, enabling user-guided inference similar to the SAM family of models. We derive high-quality annotations from a multi-stage annotation pipeline that uses various combinations of manual keypoint annotation, differentiable optimization, multi-view geometry, and dense keypoint detection. Our data engine efficiently selects and processes data to ensure data diversity, collecting unusual poses and rare imaging conditions. We present a new evaluation dataset organized by pose and appearance categories, enabling nuanced analysis of model behavior. Our experiments demonstrate superior generalization and substantial improvements over prior methods in both qualitative user preference studies and traditional quantitative analysis. Both 3DB and MHR are open-source.
翻译:我们提出了SAM 3D Body (3DB),一个用于单图像全人体三维人体网格恢复的可提示模型。该模型展现了最先进的性能,在多样化的真实世界条件下具有强大的泛化能力和一致的准确性。3DB能够估计人体、脚部和手部的姿态。它是首个采用新型参数化网格表示——动量人体骨架(MHR)的模型,该表示将骨骼结构与表面形状解耦。3DB采用编码器-解码器架构,并支持辅助提示,包括二维关键点和掩码,使用户能够进行类似SAM系列模型的引导式推理。我们通过一个多阶段标注流程获得了高质量的标注,该流程结合了手动关键点标注、可微分优化、多视图几何和密集关键点检测等多种方法。我们的数据引擎高效地筛选和处理数据,以确保数据多样性,收集不常见的姿态和罕见的成像条件。我们提出了一个新的按姿态和外观类别组织的评估数据集,以便对模型行为进行细致分析。我们的实验表明,无论是在定性的用户偏好研究还是传统的定量分析中,本方法都优于先前的方法,具有卓越的泛化能力和显著的改进。3DB和MHR均为开源项目。