We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions.
翻译:我们通过双手与世界互动,并以自我中心视角观察世界。从自我中心视角对这种交互进行三维整体理解,对机器人、增强现实/虚拟现实、动作识别和运动生成等任务至关重要。由于严重遮挡、视角偏差、相机畸变以及头部运动导致的运动模糊,精确重建此类三维交互颇具挑战性。为此,我们基于AssemblyHands和ARCTIC数据集设计了HANDS23挑战赛,并精心划分了训练集和测试集。根据提交的顶尖方法及排行榜上最近的基准结果,我们对三维手(-对象)重建任务进行了深入分析。分析表明,解决自拍相机特有的畸变问题、采用高容量Transformer学习复杂的手-对象交互、以及融合不同视角的预测结果均十分有效。我们的研究进一步揭示了现有最优方法难以处理的挑战性场景,例如快速手部运动、从狭窄自我中心视角重建对象,以及双手与对象之间的紧密接触。我们的工作将丰富学界的手部知识基础,促进未来关于自我中心手-对象交互的研究。