In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages. Our methodology centers on two core strategies: 1) language-specific data handling, and 2) synthetic data generation using large-scale language models and empirical prompt engineering. This approach demonstrates a considerable improvement over the baseline, highlighting the effectiveness of data-centric techniques. Our prompt engineering strategy further improves performance by producing superior synthetic translation examples.
翻译:在本文中,我们提出一种针对形式敏感机器翻译(FSMT)的数据驱动方法,该方法适配四种目标语言的独特语言属性。我们的方法论围绕两大核心策略:1)语言特定数据处理;2)利用大规模语言模型与经验性提示工程生成合成数据。该方法相较于基线模型展现出显著改进,凸显了以数据为中心技术的有效性。我们的提示工程策略通过生成更优质的合成翻译示例,进一步提升了翻译性能。