Humans commonly work with multiple objects in daily life and can intuitively transfer manipulation skills to novel objects by understanding object functional regularities. However, existing technical approaches for analyzing and synthesizing hand-object manipulation are mostly limited to handling a single hand and object due to the lack of data support. To address this, we construct TACO, an extensive bimanual hand-object-interaction dataset spanning a large variety of tool-action-object compositions for daily human activities. TACO contains 2.5K motion sequences paired with third-person and egocentric views, precise hand-object 3D meshes, and action labels. To rapidly expand the data scale, we present a fully-automatic data acquisition pipeline combining multi-view sensing with an optical motion capture system. With the vast research fields provided by TACO, we benchmark three generalizable hand-object-interaction tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis. Extensive experiments reveal new insights, challenges, and opportunities for advancing the studies of generalizable hand-object motion analysis and synthesis. Our data and code are available at https://taco2024.github.io.
翻译:人类在日常生活中经常操作多个物体,并能通过理解物体的功能规律直观地将操作技能迁移至新物体。然而,现有基于手物操作分析与合成的方法大多受数据支撑不足所限,仅能处理单手单物体交互。为解决此问题,我们构建了TACO——一个覆盖日常人类活动中大量工具-动作-物体组合的广泛双手手物交互数据集。TACO包含2500条与第三人称及自我中心视角同步采集的运动序列、精准的手物三维网格模型及动作标签。为快速扩展数据规模,我们提出了一种融合多视角传感与光学运动捕捉系统的全自动数据采集流程。依托TACO广阔的研究领域,我们对三项通用化手物交互任务进行了基准测试:组合动作识别、通用化手物运动预测及协作抓取合成。大量实验揭示了推进通用化手物运动分析与合成研究的新见解、挑战与机遇。我们的数据与代码已开源至https://taco2024.github.io。