We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models. Our dataset and evaluation code are publicly available: https://bit.ly/frmt-task
翻译:我们提出FRMT——一个面向少样本区域感知机器翻译(一种面向风格目标的翻译类型)的新数据集与评估基准。该数据集包含从英语到葡萄牙语和中文普通话各两种区域变体的专业翻译。源文档经过遴选以支持对特定现象(包括词汇差异项和干扰项)的详细分析。我们探索了FRMT的自动评估指标,并验证了其与专家人工评估在区域匹配与失配评分场景下的相关性。最后,我们提出该任务的若干基准模型,并为研究者如何训练、评估和比较其自有模型提供指导。我们的数据集与评估代码已公开:https://bit.ly/frmt-task