Translated texts or utterances bear several hallmarks distinct from texts originating in the language. This phenomenon, known as translationese, is well-documented, and when found in training or test sets can affect model performance. Still, work to mitigate the effect of translationese in human translated text is understudied. We hypothesize that Abstract Meaning Representation (AMR), a semantic representation which abstracts away from the surface form, can be used as an interlingua to reduce the amount of translationese in translated texts. By parsing English translations into an AMR graph and then generating text from that AMR, we obtain texts that more closely resemble non-translationese by macro-level measures. We show that across four metrics, and qualitatively, using AMR as an interlingua enables the reduction of translationese and we compare our results to two additional approaches: one based on round-trip machine translation and one based on syntactically controlled generation.
翻译:翻译文本或话语带有若干不同于源语言原创文本的特征。这一被称为“翻译腔”的现象已有充分文献记录,当出现在训练集或测试集中时可能影响模型性能。然而,目前关于如何减轻人类翻译文本中翻译腔影响的研究仍显不足。我们假设抽象语义表示(AMR)作为一种剥离表层形式的语义表示,可作为中间语言减少翻译文本中的翻译腔。通过将英文翻译解析为AMR图,再基于该AMR生成文本,我们获得在宏观指标上更接近非翻译腔文本的译文。我们通过四项指标及定性分析证明,使用AMR作为中间语言能够有效减少翻译腔,并将结果与基于往返机器翻译和基于句法控制生成的两种辅助方法进行对比。