We introduce HiSC4D, a novel Human-centered interaction and 4D Scene Capture method, aimed at accurately and efficiently creating a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, rich human-human interactions, and human-environment interactions. By utilizing body-mounted IMUs and a head-mounted LiDAR, HiSC4D can capture egocentric human motions in unconstrained space without the need for external devices and pre-built maps. This affords great flexibility and accessibility for human-centered interaction and 4D scene capturing in various environments. Taking into account that IMUs can capture human spatially unrestricted poses but are prone to drifting for long-period using, and while LiDAR is stable for global localization but rough for local positions and orientations, HiSC4D employs a joint optimization method, harmonizing all sensors and utilizing environment cues, yielding promising results for long-term capture in large scenes. To promote research of egocentric human interaction in large scenes and facilitate downstream tasks, we also present a dataset, containing 8 sequences in 4 large scenes (200 to 5,000 $m^2$), providing 36k frames of accurate 4D human motions with SMPL annotations and dynamic scenes, 31k frames of cropped human point clouds, and scene mesh of the environment. A variety of scenarios, such as the basketball gym and commercial street, alongside challenging human motions, such as daily greeting, one-on-one basketball playing, and tour guiding, demonstrate the effectiveness and the generalization ability of HiSC4D. The dataset and code will be publicated on www.lidarhumanmotion.net/hisc4d available for research purposes.
翻译:本文提出HiSC4D,一种新颖的人本交互与四维场景捕捉方法,旨在精准高效地构建包含大规模室内外场景、多样化人体运动、丰富人-人交互与人-环境交互的动态数字世界。通过使用身体佩戴的惯性测量单元(IMU)与头戴式激光雷达(LiDAR),HiSC4D能够在无外部设备与预建地图约束的条件下,于自由空间中捕捉以自我为中心的人体运动。这为各类环境中的人本交互与四维场景捕捉提供了高度的灵活性与可及性。考虑到IMU虽能捕捉空间无约束的人体姿态但长期使用易产生漂移,而LiDAR虽能实现稳定的全局定位但局部位置与方向估计较为粗糙,HiSC4D采用联合优化方法,协调所有传感器并利用环境线索,从而在大场景长期捕捉中取得了显著效果。为促进大场景中以自我为中心的人体交互研究并支持下游任务,我们还构建了一个数据集,包含4个大场景(200至5,000平方米)中的8段序列,提供36,000帧带SMPL标注的精确四维人体运动与动态场景数据、31,000帧裁剪后的人体点云以及环境场景网格。多样化的场景(如篮球馆与商业街)与具有挑战性的人体运动(如日常问候、一对一篮球对抗与导游引导),共同验证了HiSC4D方法的有效性与泛化能力。数据集与代码将公开于www.lidarhumanmotion.net/hisc4d,以供学术研究使用。