Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in their universality, as they are tailored for single-person or two-person scenarios and can not be applied to generate motions for more individuals. To achieve the number-free motion synthesis, this paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution. Furthermore, a generation module and an interaction module are designed for our FreeMotion framework to decouple the process of conditional motion generation and finally support the number-free motion synthesis. Besides, based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion. Extensive experiments demonstrate the superior performance of our method and our capability to infer single and multi-human motions simultaneously.
翻译:文本到动作合成是计算机视觉领域的一项关键任务。现有方法因其专为单人场景或双人场景设计而无法推广至更多个体的动作生成,在普适性上存在局限。为实现无人数限制的动作合成,本文重新思考动作生成问题,提出通过条件动作分布来统一单人与多人动作。此外,我们为FreeMotion框架设计了生成模块与交互模块,以解耦条件动作生成过程,最终支持无人数限制的动作合成。基于本框架,现有的单人动作空间控制方法可被无缝集成,从而实现对多人动作的精确控制。大量实验证明了本方法的优越性能及其同时推断单人与多人动作的能力。