Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method. TRUMANS stands as the most comprehensive motion-captured HSI dataset currently available, encompassing over 15 hours of human interactions across 100 indoor scenes. It intricately captures whole-body human motions and part-level object dynamics, focusing on the realism of contact. This dataset is further scaled up by transforming physical environments into exact virtual models and applying extensive augmentations to appearance and motion for both humans and objects while maintaining interaction fidelity. Utilizing TRUMANS, we devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length, taking into account both scene context and intended actions. In experiments, our approach shows remarkable zero-shot generalizability on a range of 3D scene datasets (e.g., PROX, Replica, ScanNet, ScanNet++), producing motions that closely mimic original motion-captured sequences, as confirmed by quantitative experiments and human studies.
翻译:针对人-场景交互建模中的数据稀缺与高级运动合成挑战,我们提出了TRUMANS数据集及一种新颖的HSI运动合成方法。TRUMANS是当前最全面的运动捕捉HSI数据集,涵盖100个室内场景中超过15小时的人类交互记录。该数据集精细捕捉了全身人体运动与部件级物体动态,重点关注接触的真实性。通过将物理环境转化为精确虚拟模型,并对人与物体的外观及运动进行大规模增强(同时保持交互保真度),本数据集实现了进一步扩展。基于TRUMANS,我们设计了一种基于扩散的自回归模型,该模型能高效生成任意长度的HSI序列,并综合考虑场景上下文与目标动作。实验表明,我们的方法在多种3D场景数据集(如PROX、Replica、ScanNet、ScanNet++)上展现出卓越的零样本泛化能力,生成的运动能高度还原原始运动捕捉序列,定量实验与人因研究均证实了这一点。