The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.
翻译:自动手语翻译(SLT)至口语语言的最新进展大多基于规模有限且领域受限的数据集进行基准测试。本研究首次在大型广泛数据集How2Sign上提供基线结果,从而推动该领域的发展。我们使用I3D视频特征训练Transformer模型,以简化版BLEU作为验证参考指标,而非广泛使用的BLEU分数。最终在BLEU分数上取得8.03的成绩,并首次发布相关开源实现,以促进该领域的进一步突破。