Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models , such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures.
翻译:手势合成作为一个关键研究领域已获得广泛关注,其目标是根据语音或文本输入生成符合语境且自然的手势。尽管基于深度学习的方法已取得显著进展,但这些方法往往忽略了文本中丰富的语义信息,导致生成的手势表现力不足且缺乏意义。本文提出GesGPT,一种利用大型语言模型(如ChatGPT)语义分析能力的手势生成新方法。通过发挥LLM在文本分析方面的优势,我们采用受控方法,通过文本解析脚本生成并整合专业手势与基础手势,从而产生多样且富有意义的手势。首先,我们的方法涉及设计提示原则,将手势生成转化为使用ChatGPT的意图分类问题。我们还对强调词与语义词进行进一步分析以辅助手势生成。随后,我们构建了一个具有多重语义标注的专用手势词典,将手势合成解耦为专业手势与基础手势。最后,我们将专业手势与基础手势进行融合。实验结果表明,GesGPT能有效生成符合语境且富有表现力的手势。