Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future.
翻译:翻译错误的细粒度信息对于翻译评估领域具有重要价值。现有方法无法同步考虑错误位置与类型,导致未能整合两者的错误信息。本文提出细粒度翻译错误检测(Fine-Grained Translation Error Detection, FG-TED)任务,旨在识别给定源语言-假设语句对中翻译错误的位置与类型。此外,我们构建了FG-TED模型以预测两种典型的翻译准确性错误——**添加**错误与**遗漏**错误。首先,采用词级分类范式构建模型,并通过捷径学习削减(shortcut learning reduction)缓解单语特征的影响。同时,构造合成数据集用于模型训练,并缓解权威数据集中数据标注不一致的问题,使实验基准保持一致。实验表明,我们的模型能够同步识别错误类型与位置,在恢复后的数据集上取得当前最优结果。在低资源场景与迁移场景下,该模型相较于现有基线方法能提供更可靠的预测结果。相关数据集与源代码将在未来公开。