In this paper, we investigate the use of transformers for Neural Machine Translation of text-to-GLOSS for Deaf and Hard-of-Hearing communication. Due to the scarcity of available data and limited resources for text-to-GLOSS translation, we treat the problem as a low-resource language task. We use our novel hyper-parameter exploration technique to explore a variety of architectural parameters and build an optimal transformer-based architecture specifically tailored for text-to-GLOSS translation. The study aims to improve the accuracy and fluency of Neural Machine Translation generated GLOSS. This is achieved by examining various architectural parameters including layer count, attention heads, embedding dimension, dropout, and label smoothing to identify the optimal architecture for improving text-to-GLOSS translation performance. The experiments conducted on the PHOENIX14T dataset reveal that the optimal transformer architecture outperforms previous work on the same dataset. The best model reaches a ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score of 55.18% and a BLEU-1 (BiLingual Evaluation Understudy 1) score of 63.6%, outperforming state-of-the-art results on the BLEU1 and ROUGE score by 8.42 and 0.63 respectively.
翻译:本文研究利用Transformer进行面向听障人士(Deaf and Hard-of-Hearing)交流的文本到GLOSS神经机器翻译。由于文本到GLOSS翻译中可用数据稀缺且资源有限,我们将该问题视为低资源语言任务。我们采用新型超参数探索技术,对多种架构参数进行探索,并构建了专用于文本到GLOSS翻译的最优Transformer架构。本研究旨在提升神经机器翻译生成GLOSS的准确性与流畅性。通过考察包括层数、注意力头数、嵌入维度、丢弃率(dropout)以及标签平滑(label smoothing)在内的多种架构参数,识别出改进文本到GLOSS翻译性能的最优架构。在PHOENIX14T数据集上开展的实验表明,该最优Transformer架构在该数据集上优于先前工作。最佳模型达到了55.18%的ROUGE(面向摘要评估的召回导向评测)分数和63.6%的BLEU-1(双语评估替补1)分数,在BLEU-1和ROUGE分数上分别超越现有最优结果8.42和0.63。