Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.
翻译:从单目RGB数据重建交互手势是一项具有挑战性的任务,因为其中涉及诸多干扰因素,例如自遮挡与相互遮挡以及相似纹理。以往方法仅利用单帧RGB图像的信息,未对其物理合理关系进行建模,导致重建结果较差。本文致力于显式利用时空信息以实现更优的交互手势重建。一方面,我们利用时序上下文补充单帧图像提供的不足信息,并设计了一种新颖的时序框架,通过时序约束实现交互手势运动的平滑性;另一方面,我们进一步提出一种穿透检测模块,以生成无物理碰撞且运动合理的交互手势。通过大量实验验证了所提框架的有效性,其在公开基准数据集上取得了新的最佳性能。