Methods to generate text from structured data have advanced significantly in recent years, primarily due to fine-tuning of pre-trained language models on large datasets. However, such models can fail to produce output faithful to the input data, particularly on out-of-domain data. Sufficient annotated data is often not available for specific domains, leading us to seek an unsupervised approach to improve the faithfulness of output text. Since the problem is fundamentally one of consistency between the representations of the structured data and text, we evaluate the effectiveness of cycle training in this work. Cycle training uses two models which are inverses of each other: one that generates text from structured data, and one which generates the structured data from natural language text. We show that cycle training, when initialized with a small amount of supervised data (100 samples in our case), achieves nearly the same performance as fully supervised approaches for the data-to-text generation task on the WebNLG, E2E, WTQ, and WSQL datasets. We perform extensive empirical analysis with automated evaluation metrics and a newly designed human evaluation schema to reveal different cycle training strategies' effectiveness of reducing various types of generation errors. Our code is publicly available at https://github.com/Edillower/CycleNLG.
翻译:近年来,从结构化数据生成文本的方法取得了显著进展,这主要得益于在大型数据集上微调预训练语言模型。然而,这类模型可能无法生成与输入数据忠实匹配的输出,尤其是在领域外数据上。特定领域往往缺乏足够的标注数据,这促使我们探索一种无监督方法来提升输出文本的忠实性。由于该问题本质上是结构化数据与文本表示之间的一致性问题,本文评估了循环训练的有效性。循环训练使用两个互为逆函数的模型:一个从结构化数据生成文本,另一个从自然语言文本生成结构化数据。实验表明,在仅需少量监督数据(本文中为100个样本)初始化时,循环训练在WebNLG、E2E、WTQ和WSQL数据集上的数据到文本生成任务中,能达到与全监督方法近乎相同的性能。我们通过自动评估指标和新设计的人工评估方案开展了广泛的实证分析,揭示了不同循环训练策略在减少各类生成错误上的有效性。相关代码已开源至https://github.com/Edillower/CycleNLG。