Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is limited due to the presence of noise in both the source and target sentences. To address this limitation, we focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation. Leveraging the capabilities of large language models (LLMs), we observe their impressive abilities in noise removal. For example, they can remove emojis while considering their semantic meaning. Additionally, we show that LLM can effectively rephrase slang, jargon, and profanities. The resulting datasets, called C-MTNT, exhibit significantly less noise in the target sentences while preserving the semantic integrity of the original sentences. Our human and GPT-4 evaluations also lead to a consistent conclusion that LLM performs well on this task. Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.
翻译:Transformer模型在神经机器翻译(NMT)中展现了卓越的性能。然而,它们对噪声输入的脆弱性在实际应用中构成了重大挑战,其中从噪声输入生成干净输出至关重要。MTNT数据集被广泛用作评估NMT模型对噪声输入鲁棒性的基准。然而,由于源句和目标句中都存在噪声,其效用受到限制。为解决这一局限,我们专注于清理MTNT目标句中的噪声,使其更适合作为噪声评估基准。利用大型语言模型(LLM)的能力,我们观察到它们在噪声去除方面的显著表现。例如,它们能在考虑语义含义的同时移除表情符号。此外,我们展示了LLM能有效改写俚语、行话和脏话。由此产生的数据集称为C-MTNT,在保留原句语义完整性的同时,目标句中的噪声显著减少。我们的人工评估和GPT-4评估也得出一致结论:LLM在此任务上表现良好。最后,在C-MTNT上的实验展示了其在评估NMT模型鲁棒性方面的有效性,突显了高级语言模型在数据清理中的潜力,并强调C-MTNT作为一项宝贵资源。