Understanding bimanual human hand activities is a critical problem in AI and robotics. We cannot build large models of bimanual activities because existing datasets lack the scale, coverage of diverse hand activities, and detailed annotations. We introduce GigaHands, a massive annotated dataset capturing 34 hours of bimanual hand activities from 56 subjects and 417 objects, totaling 14k motion clips derived from 183 million frames paired with 84k text annotations. Our markerless capture setup and data acquisition protocol enable fully automatic 3D hand and object estimation while minimizing the effort required for text annotation. The scale and diversity of GigaHands enable broad applications, including text-driven action synthesis, hand motion captioning, and dynamic radiance field reconstruction.
翻译:理解人类双手活动是人工智能与机器人学领域的关键问题。由于现有数据集在规模、多样化手部活动覆盖范围及精细标注方面存在不足,我们无法构建大规模的双手活动模型。本文提出GigaHands——一个大规模标注数据集,采集自56名受试者使用417种物体进行的双手活动,总时长34小时,包含源自1.83亿帧图像生成的1.4万个动作片段,并配有8.4万条文本标注。我们采用的无标记捕捉方案与数据采集协议实现了全自动的3D手部与物体姿态估计,同时极大降低了文本标注所需的工作量。GigaHands的规模与多样性支持广泛的应用场景,包括文本驱动动作合成、手部运动描述生成以及动态辐射场重建。