Sign language translation systems are complex and require many components. As a result, it is very hard to compare methods across publications. We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. We propose three different components for the text-to-gloss translation: a lemmatizer, a rule-based word reordering and dropping component, and a neural machine translation system. Gloss-to-pose conversion occurs using data from a lexicon for three different signed languages, with skeletal poses extracted from videos. To generate a sentence, the text-to-gloss system is first run, and the pose representations of the resulting signs are stitched together.
翻译:手语翻译系统复杂度高且需多组件协同,导致不同文献间的方法难以比对。本文提出基于文本-词汇注释-姿态-视频流水线(text-to-gloss-to-pose-to-video pipeline)的开源实现,展示了德语到瑞士德语手语、法语到瑞士法语手语、意大利语到瑞士意大利语手语的转换。我们为文本到词汇注释翻译设计了三种不同组件:词形还原器、基于规则的词序重排与删减模块,以及神经机器翻译系统。通过来自三种不同手语词典的数据(骨骼姿态皆从视频中提取)实现词汇注释到姿态的转换。生成句子时,首先运行文本到词汇注释系统,再将生成符号的姿态表征进行拼接。