Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation (MT). In this paper, we propose an agentic framework, NeoAMT, for neologism-aware machine translation using a Wiktionary search tool. Specifically, we first create a new dataset for neologism-aware machine translation and develop a search tool based on Wiktionary. The new dataset covers 16 languages and 75 translation directions and is derived from approximately 10 million records of an English Wiktionary dump. The retrieval corpus of the search tool is also constructed from around 3 million cleaned records of the Wiktionary dump. We then use it for training the translation agent with reinforcement learning (RL) and evaluating the accuracy of neologism-aware machine translation. Based on this, we also propose an RL training framework that contains a novel reward design and an adaptive rollout generation approach by leveraging "translation difficulty" to further improve the translation quality of translation agents using our search tool.
翻译:新词感知机器翻译旨在将包含新词的源语句翻译为目标语言。与通用机器翻译相比,该领域的研究仍显不足。本文提出一种基于维基词典检索工具的智能框架NeoAMT,用于新词感知机器翻译。具体而言,我们首先构建了一个面向新词感知机器翻译的全新数据集,并开发了基于维基词典的检索工具。该数据集涵盖16种语言及75个翻译方向,源自约1000万条英文维基词典转储记录。检索工具的语料库亦由约300万条经过清洗的维基词典记录构建而成。我们随后利用该工具,通过强化学习训练翻译智能体,并评估新词感知机器翻译的准确率。在此基础上,我们进一步提出一种强化学习训练框架,该框架包含创新的奖励设计机制,以及通过量化“翻译难度”实现的自适应推演生成方法,以提升使用本检索工具的翻译智能体的译文质量。