Gesture synthesis has gained significant attention as a critical research area, focusing on producing contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. We propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of Large Language Models (LLMs), such as GPT. By capitalizing on the strengths of LLMs for text analysis, we design prompts to extract gesture-related information from textual input. Our method entails developing prompt principles that transform gesture generation into an intention classification problem based on GPT, and utilizing a curated gesture library and integration module to produce semantically rich co-speech gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures, offering a new perspective on semantic co-speech gesture generation.
翻译:手势合成作为关键研究领域已引起广泛关注,其重点在于生成与语音或文本输入相匹配且语境恰当的自然手势。尽管基于深度学习方法取得了显著进展,但它们往往忽视文本中蕴含的丰富语义信息,导致生成的手势缺乏表现力与意义。我们提出GesGPT,一种利用大语言模型(如GPT)语义分析能力的手势生成创新方法。通过发挥LLMs在文本分析方面的优势,我们设计提示词从文本输入中提取与手势相关的信息。该方法的核心在于:基于GPT制定提示原则,将手势生成转化为意图分类问题;并利用精心构建的手势库与集成模块,生成语义丰富的共语手势。实验结果表明,GesGPT能有效生成语境恰当且富有表现力的手势,为语义化共语手势生成提供了全新视角。