Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.
翻译:从单目RGB数据中重建交互手部是一项具有挑战性的任务,因为它涉及诸多干扰因素,例如自遮挡与互遮挡以及相似纹理。以往的工作仅利用单张RGB图像的信息,未对其物理合理关系进行建模,导致重建效果较差。本研究致力于显式利用时空信息以实现更优的交互手部重建。一方面,我们利用时间上下文弥补单帧信息不足,设计了一种新颖的时间框架并引入时间约束,以保证交互手部运动的平滑性。另一方面,我们进一步提出了一种穿透检测模块,用于生成无物理碰撞且运动学合理的交互手部。通过大量实验验证了所提框架的有效性,其在公开基准上达到了新的最优性能。