CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Recent advances in generative modeling and tokenization have driven significant progress in text-to-motion generation, leading to enhanced quality and realism in generated motions. However, effectively leveraging textual information for conditional motion generation remains an open challenge. We observe that current approaches, primarily relying on fixed-length text embeddings (e.g., CLIP) for global semantic injection, struggle to capture the composite nature of human motion, resulting in suboptimal motion quality and controllability. To address this limitation, we propose the Composite Aware Semantic Injection Mechanism (CASIM), comprising a composite-aware semantic encoder and a text-motion aligner that learns the dynamic correspondence between text and motion tokens. Notably, CASIM is model and representation-agnostic, readily integrating with both autoregressive and diffusion-based methods. Experiments on HumanML3D and KIT benchmarks demonstrate that CASIM consistently improves motion quality, text-motion alignment, and retrieval scores across state-of-the-art methods. Qualitative analyses further highlight the superiority of our composite-aware approach over fixed-length semantic injection, enabling precise motion control from text prompts and stronger generalization to unseen text inputs.

翻译：生成建模与分词技术的近期进展推动了文本到动作生成领域的显著进步，使得生成动作的质量与真实感得到提升。然而，如何有效利用文本信息进行条件动作生成仍是一个开放挑战。我们观察到，当前方法主要依赖固定长度的文本嵌入（如CLIP）进行全局语义注入，难以捕捉人体动作的复合特性，导致生成动作的质量与可控性欠佳。为应对这一局限，我们提出复合感知语义注入机制（CASIM），该机制包含一个复合感知语义编码器和一个文本-动作对齐器，后者学习文本与动作分词之间的动态对应关系。值得注意的是，CASIM与具体模型及表示方式无关，可无缝集成于自回归与基于扩散的两类方法中。在HumanML3D和KIT基准上的实验表明，CASIM能够持续提升现有先进方法的动作质量、文本-动作对齐度及检索分数。定性分析进一步凸显了我们的复合感知方法相较于固定长度语义注入的优越性，其能够通过文本提示实现精确的动作控制，并对未见文本输入展现出更强的泛化能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日