We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 4 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. With HOI4D, we establish three benchmarking tasks to promote category-level HOI from 4D visual signals including semantic segmentation of 4D dynamic point cloud sequences, category-level object pose tracking, and egocentric action segmentation with diverse interaction targets. In-depth analysis shows HOI4D poses great challenges to existing methods and produces great research opportunities.
翻译:我们提出HOI4D,一个带有丰富标注的大规模4D自我中心数据集,旨在推动类别级人-物交互研究。HOI4D包含240万张RGB-D自我中心视频帧,来自4名参与者在610个不同室内房间中与16个类别的800个不同物体实例进行交互的4000余条序列。数据集提供了全景分割、运动分割、3D手部姿态、类别级物体姿态和手部动作的逐帧标注,以及重建的物体网格和场景点云。借助HOI4D,我们建立了三个基准任务以促进从4D视觉信号中理解类别级人-物交互,包括4D动态点云序列的语义分割、类别级物体姿态跟踪以及面向多样化交互目标的自我中心动作分割。深入分析表明,HOI4D对现有方法提出了重大挑战,并创造了重要的研究机遇。