Pre-trained language models (PLM) have achieved remarkable advancement in table-to-text generation tasks. However, the lack of labeled domain-specific knowledge and the topology gap between tabular data and text make it difficult for PLMs to yield faithful text. Low-resource generation likewise faces unique challenges in this domain. Inspired by how humans descript tabular data with prior knowledge, we suggest a new framework: PromptMize, which targets table-to-text generation under few-shot settings. The design of our framework consists of two aspects: a prompt planner and a knowledge adapter. The prompt planner aims to generate a prompt signal that provides instance guidance for PLMs to bridge the topology gap between tabular data and text. Moreover, the knowledge adapter memorizes domain-specific knowledge from the unlabelled corpus to supply essential information during generation. Extensive experiments and analyses are investigated on three open domain few-shot NLG datasets: human, song, and book. Compared with previous state-of-the-art approaches, our model achieves remarkable performance in generating quality as judged by human and automatic evaluations.
翻译:预训练语言模型(PLM)在表格到文本生成任务中取得了显著进展。然而,领域特定标注知识的缺乏以及表格数据与文本之间的拓扑差异,使得PLM难以生成忠实于原文的文本。低资源生成同样在这一领域面临独特挑战。受人类利用先验知识描述表格数据方式的启发,我们提出了一种新框架:PromptMize,该框架针对少样本设置下的表格到文本生成任务。本框架的设计包含两个核心组件:提示规划器与知识适配器。提示规划器旨在生成提示信号,为PLM提供实例化引导,以弥合表格数据与文本之间的拓扑差异。此外,知识适配器从无标注语料库中记忆领域特定知识,在生成过程中提供必要的信息补全。我们在三个开放域少样本自然语言生成数据集(人物、歌曲、书籍)上开展了广泛实验与分析。与先前最先进的方法相比,我们的模型在人工评估与自动评估的生成质量指标上均取得了显著性能提升。