Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method. TRUMANS stands as the most comprehensive motion-captured HSI dataset currently available, encompassing over 15 hours of human interactions across 100 indoor scenes. It intricately captures whole-body human motions and part-level object dynamics, focusing on the realism of contact. This dataset is further scaled up by transforming physical environments into exact virtual models and applying extensive augmentations to appearance and motion for both humans and objects while maintaining interaction fidelity. Utilizing TRUMANS, we devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length, taking into account both scene context and intended actions. In experiments, our approach shows remarkable zero-shot generalizability on a range of 3D scene datasets (e.g., PROX, Replica, ScanNet, ScanNet++), producing motions that closely mimic original motion-captured sequences, as confirmed by quantitative experiments and human studies.
翻译:针对人-场景交互建模中数据稀缺与运动合成技术的挑战,我们提出TRUMANS数据集及一种新颖的HSI运动合成方法。TRUMANS是目前规模最大的动作捕捉HSI数据集,涵盖100个室内场景中总计超过15小时的人机交互数据,精细捕捉全身人体运动与物体部件级动态,并重点关注接触真实性。通过将物理环境转化为精确虚拟模型,并对人体/物体的外观与运动进行广泛增强(同时保持交互保真度),该数据集得以进一步扩展。基于TRUMANS,我们设计了一个扩散自回归模型,可在考虑场景上下文与意图动作的前提下,高效生成任意长度的HSI序列。实验表明,我们的方法在PROX、Replica、ScanNet、ScanNet++等多个3D场景数据集上展现出显著的零样本泛化能力,生成的运动序列与原始动作捕捉序列高度吻合,定量实验与人类研究均验证了此结果。