VEPHand: View-Efficient Photometric Hand Performance Capture at Scale

Zhengyang Shen,Kai-Hung Chang,Erroll Wood,Deying Kong,Bo Peng,Timo Bolkart,Jinlong Yang,Bowen Zhao,Danhang Tang,Sasa Petrovic,Emre Aksan,Jérémy Riviere,Vassilis Choutas,Delio Vicini,Jay Busch,Shichen Liu,Zhe Cao,Hugh Liu,JingJing Shen,Jonathan Taylor,Mingsong Dou

Robust, high-fidelity 3D hand capture, while fundamental to digital human creation, remains challenging with practical multi-view systems that balance rich photometry with the geometric ambiguities of reconstruction arising from limited viewpoint density. This paper presents an end-to-end pipeline for dynamic hand performance capture and registration, specifically designed for view-efficient setups ($\sim$20 views). We address key challenges with two primary innovations. First, to overcome reconstruction difficulties like limited view overlap and background clutter, our mask-free neural method robustly extracts detailed hand geometry and appearance from unmasked images using scene parameterization and scenario-specific density regularization. Second, addressing registration challenges such as accurately capturing non-linear skin deformations and ensuring plausible results during severe self-contact, we propose a physics-inspired framework. It aligns reconstructions to a personalized hand model by optimizing intrinsic volumetric offsets within its canonical tetrahedral mesh, alongside pose parameters. This approach, supported by robust losses and optimization, captures fine surface deformations, ensures plausible results under severe articulation and self-contact, and demonstrates strong tolerance to input noise. We demonstrate the scalability and robustness of our automated pipeline on an extensive dataset of over 12,000 sequences, from which we also derive a large-scale, high-quality synthetic 2D/3D hand dataset for training downstream tasks. This showcases its effectiveness for single hands, intricate two-hand interactions, and natural hand-object manipulations. Our method achieves state-of-the-art reconstruction fidelity in view-efficient, unmasked scenarios and highly accurate registration. Our project page are available at https://zyshen021.github.io/VEPHand/.

翻译：鲁棒、高保真的三维手部捕捉是数字人创建的基础，但在兼顾丰富光度信息与有限视角密度导致的重建几何歧义性的实用多视角系统中仍具挑战。本文提出一种面向动态手部动作捕捉与配准的端到端流程，专门针对视角高效配置（约20个视角）设计。我们通过两项核心创新应对关键挑战：首先，为克服视角重叠有限、背景杂乱等重建困难，我们的无掩模神经方法通过场景参数化与场景特定密度正则化，从无掩模图像中鲁棒提取精细手部几何与外观；其次，针对非线性皮肤形变精准捕捉及严重自接触下结果合理性等配准难题，我们提出物理启发式框架——通过优化标准四面体网格内的固有体积偏移量及姿态参数，将重建结果对齐至个性化手部模型。该方法依托鲁棒损失函数与优化策略，可捕获精细表面形变，在极端关节运动与自接触条件下确保结果合理性，并展现出对输入噪声的强容错性。我们在超12,000个序列的大规模数据集上验证了自动流程的可扩展性与鲁棒性，并由此衍生出用于下游任务训练的大规模高质量合成2D/3D手部数据集，展现其在单只手、复杂双手交互及自然手物操作场景下的有效性。该方法在视角高效、无掩模场景中实现了最先进的重建保真度与高精度配准。项目页面：https://zyshen021.github.io/VEPHand/。