In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.
翻译:本文首次提出了一个全面的多语种手语数据集——Prompt2Sign,该数据集基于包括美国手语(ASL)及其他七种手语的公开数据构建。我们通过将大量视频转化为简洁、适配模型的格式,优化了其与序列到序列(seq2seq)和文本到文本(text2text)等翻译模型的训练兼容性。基于这一新数据集,我们提出SignLLM——首个多语种手语生成(SLP)模型,该模型包含两种新颖的多语种SLP模式,能够从输入文本或提示词生成手语手势。两种模式均可采用一种基于强化学习的新型损失函数与模块,通过提升模型自主采样高质量数据的能力来加速训练。我们展示了SignLLM的基准测试结果,证明该模型在八种手语的SLP任务中均达到了当前最优性能。