Pre-trained language models (PLM) have achieved remarkable advancement in table-to-text generation tasks. However, the lack of labeled domain-specific knowledge and the topology gap between tabular data and text make it difficult for PLMs to yield faithful text. Low-resource generation likewise faces unique challenges in this domain. Inspired by how humans descript tabular data with prior knowledge, we suggest a new framework: PromptMize, which targets table-to-text generation under few-shot settings. The design of our framework consists of two aspects: a prompt planner and a knowledge adapter. The prompt planner aims to generate a prompt signal that provides instance guidance for PLMs to bridge the topology gap between tabular data and text. Moreover, the knowledge adapter memorizes domain-specific knowledge from the unlabelled corpus to supply essential information during generation. Extensive experiments and analyses are investigated on three open domain few-shot NLG datasets: human, song, and book. Compared with previous state-of-the-art approaches, our model achieves remarkable performance in generating quality as judged by human and automatic evaluations.
翻译:预训练语言模型(PLM)在表格到文本生成任务中取得了显著进展。然而,缺乏标注的领域特定知识以及表格数据与文本之间的拓扑结构差异,使得预训练语言模型难以生成忠实的文本。低资源生成在此领域同样面临独特挑战。受人类利用先验知识描述表格数据的启发,我们提出一个新框架:PromptMize,旨在解决小样本场景下的表格到文本生成问题。该框架的设计包含两个方面:提示规划器和知识适配器。提示规划器旨在生成提示信号,为预训练语言模型提供实例引导,以弥合表格数据与文本之间的拓扑结构差距。此外,知识适配器从无标注语料库中记忆领域特定知识,在生成过程中提供必要信息。我们在三个开放领域小样本自然语言生成数据集(人类、歌曲、书籍)上进行了大量实验与分析。与先前最先进的方法相比,我们的模型在人工评估与自动评估中均展现出卓越的生成质量。