Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities. However, recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects, due to fundamental data scarcity. In this paper, we introduce HOI-M3, a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably, it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M3 dataset, we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis. Our HOI-M3 dataset, corresponding codes, and pre-trained models will be disseminated to the community for future research.
翻译:人类在自然状态下既与他人互动,也与周围多个物体互动,参与各种社交活动。然而,由于根本性的数据匮乏,近期人体-物体交互建模的研究大多关注孤立的个体与物体。本文提出HOI-M3——一个用于建模多人多物交互的新型大规模数据集。值得关注的是,该数据集通过密集RGB信号和物体内置IMU输入,为人体和物体提供精确的三维追踪,涵盖199个序列、1.81亿帧数据,包含丰富活动中多种多样的人体与物体。基于独特的HOI-M3数据集,我们提出两项伴随强基线的新型数据驱动任务:单人视点多人多物交互捕捉与非结构化多人多物交互生成。大量实验表明,本数据集具有挑战性,值得对多人多物交互及行为分析展开进一步研究。我们的HOI-M3数据集、相关代码及预训练模型将向学术界公开以供未来研究。