Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset \cite{MTNT} is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is limited due to the presence of noise in both the source and target sentences. To address this limitation, we focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation. Leveraging the capabilities of large language models (LLMs), we observe their impressive abilities in noise removal. For example, they can remove emojis while considering their semantic meaning. Additionally, we show that LLM can effectively rephrase slang, jargon, and profanities. The resulting datasets, called C-MTNT, exhibit significantly less noise in the target sentences while preserving the semantic integrity of the original sentences. Our human and GPT-4 evaluations also lead to a consistent conclusion that LLM performs well on this task. Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.
翻译:Transformer模型在神经机器翻译(NMT)中已展现出卓越性能。然而,其对含噪输入的脆弱性给实际应用带来重大挑战——从含噪输入生成干净输出至关重要。MTNT数据集\cite{MTNT}被广泛用作评估NMT模型对含噪输入鲁棒性的基准,但由于源语言和目标语句中均存在噪声,其效用受限于此。为解决此问题,我们聚焦于清理MTNT中目标语句的噪声,使其更适合作为噪声评估基准。借助大语言模型(LLM)的能力,我们观察到其在噪声去除方面表现惊人:例如,LLM能删除表情符号同时保留其语义,还能有效改写俚语、行话和粗俗语。由此生成的数据集C-MTNT在目标语句中噪声显著减少,同时保持原句语义完整性。人类评估与GPT-4评估一致表明,LLM在此任务中表现优异。最后,在C-MTNT上的实验展示了其评估NMT模型鲁棒性的有效性,突显了先进语言模型在数据清理中的潜力,并强调了C-MTNT作为宝贵资源的价值。