In this paper, we propose SignLLM, a multilingual Sign Language Production (SLP) large language model, which includes two novel multilingual SLP modes MLSF and Prompt2LangGloss that allow sign language gestures generation from query texts input and question-style prompts input respectively. Both modes can use a new RL loss based on reinforcement learning and a new RL module named Priority Learning Channel. These RL components can accelerate the training by enhancing the model's capability to sample high-quality data. For SignLLM's training, we introduce Prompt2Sign, a comprehensive multilingual sign language dataset, which builds from public data, including American Sign Language (ASL) and seven others. This dataset standardizes information by extracting pose information from sign language videos into a unified compressed format. We extensively evaluate SignLLM, demonstrating that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.
翻译:本文提出SignLLM,一种多语言手语生成大语言模型,其包含两种新颖的多语言手语生成模式MLSF与Prompt2LangGloss,分别支持从查询文本输入和问答式提示输入生成手语手势。两种模式均可采用基于强化学习的新型RL损失函数及名为Priority Learning Channel的新型RL模块。这些强化学习组件通过增强模型对高质量数据的采样能力,可加速训练过程。针对SignLLM的训练,我们构建了Prompt2Sign——一个从公开数据(包含美国手语及其他七种手语)构建的综合性多语言手语数据集。该数据集通过从手语视频中提取姿态信息并压缩为统一格式,实现了信息标准化。我们对SignLLM进行了全面评估,实验表明该模型在八种手语的手语生成任务中均达到了最先进的性能水平。