Sign Language Translation (SLT) is a promising technology to bridge the communication gap between the deaf and the hearing people. Recently, researchers have adopted Neural Machine Translation (NMT) methods, which usually require large-scale corpus for training, to achieve SLT. However, the publicly available SLT corpus is very limited, which causes the collapse of the token representations and the inaccuracy of the generated tokens. To alleviate this issue, we propose ConSLT, a novel token-level \textbf{Con}trastive learning framework for \textbf{S}ign \textbf{L}anguage \textbf{T}ranslation , which learns effective token representations by incorporating token-level contrastive learning into the SLT decoding process. Concretely, ConSLT treats each token and its counterpart generated by different dropout masks as positive pairs during decoding, and then randomly samples $K$ tokens in the vocabulary that are not in the current sentence to construct negative examples. We conduct comprehensive experiments on two benchmarks (PHOENIX14T and CSL-Daily) for both end-to-end and cascaded settings. The experimental results demonstrate that ConSLT can achieve better translation quality than the strong baselines.
翻译:符号语言翻译(SLT)是一项有望弥合聋人与听力正常人群沟通障碍的前沿技术。近年来,研究者采用通常需要大规模语料库训练的神经机器翻译(NMT)方法来实现SLT。然而,公开可用的SLT语料库十分有限,这导致令牌表示崩溃及生成令牌不准确的问题。为缓解该问题,我们提出ConSLT——一种新颖的基于令牌级对比学习的符号语言翻译框架。该方法通过将令牌级对比学习融入SLT解码过程,学习有效的令牌表示。具体而言,ConSLT在解码过程中将每个令牌及其由不同丢弃掩码生成的对应令牌视为正样本对,并从词汇表中随机采样$K$个当前句子中未出现的令牌构建负样本。我们在两个基准数据集(PHOENIX14T和CSL-Daily)上进行了端到端和级联两种设置的全面实验。实验结果表明,ConSLT相较于强基线方法能够实现更优的翻译质量。