Dyn-HaMR：从动态相机中恢复4D交互手部运动 (Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera)

We propose Dyn-HaMR, to the best of our knowledge, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Reconstructing accurate 3D hand meshes from monocular videos is a crucial task for understanding human behaviour, with significant applications in augmented and virtual reality (AR/VR). However, existing methods for monocular hand reconstruction typically rely on a weak perspective camera model, which simulates hand motion within a limited camera frustum. As a result, these approaches struggle to recover the full 3D global trajectory and often produce noisy or incorrect depth estimations, particularly when the video is captured by dynamic or moving cameras, which is common in egocentric scenarios. Our Dyn-HaMR consists of a multi-stage, multi-objective optimization pipeline, that factors in (i) simultaneous localization and mapping (SLAM) to robustly estimate relative camera motion, (ii) an interacting-hand prior for generative infilling and to refine the interaction dynamics, ensuring plausible recovery under (self-)occlusions, and (iii) hierarchical initialization through a combination of state-of-the-art hand tracking methods. Through extensive evaluations on both in-the-wild and indoor datasets, we show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery. This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras. Our project page is at https://dyn-hamr.github.io/.

翻译：我们提出了Dyn-HaMR，据我们所知，这是首个从野外动态相机拍摄的单目视频中重建4D全局手部运动的方法。从单目视频中重建精确的3D手部网格是理解人类行为的关键任务，在增强现实和虚拟现实（AR/VR）中具有重要应用。然而，现有的单目手部重建方法通常依赖于弱透视相机模型，该模型在有限的相机视锥内模拟手部运动。因此，这些方法难以恢复完整的3D全局轨迹，并且经常产生噪声或错误的深度估计，尤其是在视频由动态或移动相机（这在以自我为中心的视角场景中很常见）拍摄时。我们的Dyn-HaMR包含一个多阶段、多目标的优化流程，该流程综合考虑了：（i）同时定位与地图构建（SLAM）以稳健地估计相对相机运动，（ii）用于生成式填充和优化交互动态的交互手部先验，确保在（自）遮挡下实现合理的恢复，以及（iii）通过结合最先进的手部跟踪方法进行分层初始化。通过在野外和室内数据集上的广泛评估，我们表明，在4D全局网格恢复方面，我们的方法显著优于现有最先进的方法。这为从移动相机拍摄的单目视频中进行手部运动重建建立了一个新的基准。我们的项目页面位于 https://dyn-hamr.github.io/。