To enable machines to learn how humans interact with the physical world in our daily activities, it is crucial to provide rich data that encompasses the 3D motion of humans as well as the motion of objects in a learnable 3D representation. Ideally, this data should be collected in a natural setup, capturing the authentic dynamic 3D signals during human-object interactions. To address this challenge, we introduce the ParaHome system, designed to capture and parameterize dynamic 3D movements of humans and objects within a common home environment. Our system consists of a multi-view setup with 70 synchronized RGB cameras, as well as wearable motion capture devices equipped with an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction. Notably, our dataset offers key advancement over existing datasets in three main aspects: (1) capturing 3D body and dexterous hand manipulation motion alongside 3D object movement within a contextual home environment during natural activities; (2) encompassing human interaction with multiple objects in various episodic scenarios with corresponding descriptions in texts; (3) including articulated objects with multiple parts expressed with parameterized articulations. Building upon our dataset, we introduce new research tasks aimed at building a generative model for learning and synthesizing human-object interactions in a real-world room setting.
翻译:为使机器能够学习人类在日常活动中如何与物理世界交互,关键需提供包含人体三维运动及物体运动在内的丰富数据,并具备可学习的三维表征。理想情况下,这类数据应在自然环境中采集,以捕捉人-物交互过程中真实的动态三维信号。针对这一挑战,我们提出了ParaHome系统,旨在围绕居家环境,对人体与物体的动态三维运动进行捕获与参数化建模。该系统由一套包含70台同步RGB相机的多视角设备,以及配备惯性测量单元(IMU)的全身动作捕捉服和手势动作捕捉手套构成。借助ParaHome系统,我们采集了一个全新的大规模人-物交互数据集。值得注意的是,本数据集在现有数据基础上实现了三方面关键突破:(1)在自然活动的家庭场景中,同步捕获包含灵巧手部操纵动作的三维人体运动与三维物体运动;(2)涵盖多种情景化任务场景下人与多物体的交互,并配有对应的文本描述;(3)包含具有参数化关节的多部件可活动物体。基于该数据集,我们提出了新的研究任务,旨在构建能够学习并合成真实室内场景中人-物交互行为的生成模型。