Reconstructing two hands from monocular RGB images is challenging due to frequent occlusion and mutual confusion. Existing methods mainly learn an entangled representation to encode two interacting hands, which are incredibly fragile to impaired interaction, such as truncated hands, separate hands, or external occlusion. This paper presents ACR (Attention Collaboration-based Regressor), which makes the first attempt to reconstruct hands in arbitrary scenarios. To achieve this, ACR explicitly mitigates interdependencies between hands and between parts by leveraging center and part-based attention for feature extraction. However, reducing interdependence helps release the input constraint while weakening the mutual reasoning about reconstructing the interacting hands. Thus, based on center attention, ACR also learns cross-hand prior that handle the interacting hands better. We evaluate our method on various types of hand reconstruction datasets. Our method significantly outperforms the best interacting-hand approaches on the InterHand2.6M dataset while yielding comparable performance with the state-of-the-art single-hand methods on the FreiHand dataset. More qualitative results on in-the-wild and hand-object interaction datasets and web images/videos further demonstrate the effectiveness of our approach for arbitrary hand reconstruction. Our code is available at https://github.com/ZhengdiYu/Arbitrary-Hands-3D-Reconstruction.
翻译:从单目RGB图像重建双手因频繁遮挡和相互混淆而具有挑战性。现有方法主要学习编码两只交互手的纠缠表示,但这对于受损交互(如截断手、分离手或外部遮挡)极为脆弱。本文提出ACR(基于注意力协作的回归器),首次尝试在任意场景下重建双手。为此,ACR通过利用中心注意力和部位注意力进行特征提取,显式降低手部之间及部位之间的相互依赖性。然而,降低相互依赖性有助于释放输入约束,但会削弱交互手重建中的相互推理能力。因此,基于中心注意力,ACR还学习跨手先验以更好地处理交互手。我们在多种手部重建数据集上评估了该方法。在InterHand2.6M数据集上,我们的方法显著优于最佳交互手方法,同时在FreiHand数据集上取得与最先进单手法相当的性能。在野外场景、手物交互数据集及网络图像/视频上的更多定性结果进一步证明了我们方法对任意手部重建的有效性。代码地址:https://github.com/ZhengdiYu/Arbitrary-Hands-3D-Reconstruction。