We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos are available at: https://motion-rag.github.io/
翻译:本文提出MoRAG,一种基于多部件融合的新型检索增强生成策略,用于文本驱动的人体动作生成。该方法通过改进的动作检索过程获取额外知识,从而增强动作扩散模型的性能。通过有效提示大语言模型(LLMs),我们解决了动作检索中的拼写错误和句式改写问题。我们的方法采用多部件检索策略,提升了动作检索在语言空间中的泛化能力。通过对检索动作进行空间组合,我们生成了多样化的样本。此外,利用低层次、部件特定的动作信息,我们能够为未见过的文本描述构建动作样本。实验表明,我们的框架可作为即插即用模块,有效提升动作扩散模型的性能。代码、预训练模型及示例视频发布于:https://motion-rag.github.io/