In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages. Our methodology centers on two core strategies: 1) language-specific data handling, and 2) synthetic data generation using large-scale language models and empirical prompt engineering. This approach demonstrates a considerable improvement over the baseline, highlighting the effectiveness of data-centric techniques. Our prompt engineering strategy further improves performance by producing superior synthetic translation examples.
翻译:本文提出了一种面向形式敏感机器翻译(FSMT)的数据驱动方法,该方法针对四种目标语言的独特语言特性进行了适配。我们的方法论围绕两个核心策略展开:1)语言特定数据处理,以及2)利用大规模语言模型与经验性提示工程进行合成数据生成。该方法相比基线表现出显著改进,凸显了数据驱动技术的有效性。我们的提示工程策略通过生成更优质的合成翻译样例,进一步提升了系统性能。