One approach for multilingual data-to-text generation is to translate grammatical configurations upfront from the source language into each target language. These configurations are then used by a surface realizer and in document planning stages to generate output. In this paper, we describe a rule-based NLG implementation of this approach where the configuration is translated by Neural Machine Translation (NMT) combined with a one-time human review, and introduce a cross-language grammar dependency model to create a multilingual NLG system that generates text from the source data, scaling the generation phase without a human in the loop. Additionally, we introduce a method for human post-editing evaluation on the automatically translated text. Our evaluation on the SportSett:Basketball dataset shows that our NLG system performs well, underlining its grammatical correctness in translation tasks.
翻译:多语言数据到文本生成的一种方法是将语法配置从源语言预先翻译至各目标语言。这些配置随后被表层实现器及文档规划阶段用于生成输出。本文描述了一种基于规则的NLG实现方案:通过神经机器翻译结合一次性人工审核完成配置翻译,并引入跨语言语法依赖模型构建多语言NLG系统,实现从源数据生成文本,使生成阶段无需人工介入即可扩展规模。此外,我们提出了一种针对自动翻译文本的人工后编辑评估方法。在SportSett:Basketball数据集上的评估表明,我们的NLG系统表现良好,其在翻译任务中的语法正确性尤为突出。