Human-human motion generation is essential for understanding humans as social beings. Current methods fall into two main categories: single-person-based methods and separate modeling-based methods. To delve into this field, we abstract the overall generation process into a general framework MetaMotion, which consists of two phases: temporal modeling and interaction mixing. For temporal modeling, the single-person-based methods concatenate two people into a single one directly, while the separate modeling-based methods skip the modeling of interaction sequences. The inadequate modeling described above resulted in sub-optimal performance and redundant model parameters. In this paper, we introduce TIMotion (Temporal and Interactive Modeling), an efficient and effective framework for human-human motion generation. Specifically, we first propose Causal Interactive Injection to model two separate sequences as a causal sequence leveraging the temporal and causal properties. Then we present Role-Evolving Scanning to adjust to the change in the active and passive roles throughout the interaction. Finally, to generate smoother and more rational motion, we design Localized Pattern Amplification to capture short-term motion patterns. Extensive experiments on InterHuman and InterX demonstrate that our method achieves superior performance. Project page: https://aigc-explorer.github.io/TIMotion-page/
翻译:人-人运动生成对于理解作为社会性存在的人类至关重要。现有方法主要分为两大类:基于单人的方法与基于分离建模的方法。为深入探索该领域,我们将整体生成过程抽象为一个通用框架MetaMotion,该框架包含两个阶段:时序建模与交互融合。在时序建模方面,基于单人的方法直接将两人序列拼接为单人序列,而基于分离建模的方法则跳过了交互序列的建模过程。上述不充分的建模方式导致生成性能欠佳且模型参数冗余。本文提出TIMotion(时序交互建模框架),一种高效且有效的人-人运动生成框架。具体而言,我们首先提出因果交互注入机制,利用时序性与因果性将两个独立序列建模为因果序列;继而提出角色演化扫描机制,以适应交互过程中主动与被动角色的动态变化;最后,为生成更平滑合理的运动,我们设计了局部模式增强机制以捕捉短期运动模式。在InterHuman与InterX数据集上的大量实验表明,本方法取得了优越的性能。项目页面:https://aigc-explorer.github.io/TIMotion-page/