Retrieval-augmented machine translation leverages examples from a translation memory by retrieving similar instances. These examples are used to condition the predictions of a neural decoder. We aim to improve the upstream retrieval step and consider a fixed downstream edit-based model: the multi-Levenshtein Transformer. The task consists of finding a set of examples that maximizes the overall coverage of the source sentence. To this end, we rely on the theory of submodular functions and explore new algorithms to optimize this coverage. We evaluate the resulting performance gains for the machine translation task.
翻译:检索增强机器翻译通过从翻译记忆中检索相似实例来利用示例。这些示例用于调节神经解码器的预测。我们的目标是改进上游检索步骤,并考虑一个固定的下游基于编辑的模型:多莱文斯坦Transformer。该任务的核心在于寻找一组能够最大化源语句整体覆盖度的示例。为此,我们依据子模函数理论,探索了优化此覆盖度的新算法。我们评估了该方法在机器翻译任务上带来的性能提升。