Pretrained language models (PLMs) have made remarkable progress in table-to-text generation tasks. However, the lack of domain-specific knowledge makes it challenging to bridge the topological gap between tabular data and text, especially in real-world applications with limited resources. To mitigate the limitation of insufficient labeled data, we propose a novel framework: Adapt-Prompt-to-Generate (AdaPTGen). The core insight of AdaPTGen is to adapt prompt templates of domain-specific knowledge into the model, which brings at least three benefits: (1) it injects representation of normal table-related descriptions to bridge the topological gap between tabular data and texts; (2) it enables us to use large amounts of unlabeled domain-specific knowledge fully, which can alleviate the PLMs' inherent shortcomings of lacking domain knowledge; (3) it allows us to design various tasks to explore the domain-specific knowledge. Extensive experiments and analyses are conducted on three open-domain few-shot natural language generation (NLG) data sets: Humans, Songs, and Books. Compared to previous state-of-the-art approaches, our model achieves superior performance in terms of both fluency and accuracy.
翻译:预训练语言模型在表格到文本生成任务中取得了显著进展。然而,缺乏领域特定知识使得弥合表格数据与文本之间的拓扑差异变得困难,尤其是在资源受限的真实应用场景中。为缓解标注数据不足的局限,我们提出一种新颖框架——Adapt-Prompt-to-Generate(AdaPTGen)。AdaPTGen的核心思想是将领域特定知识的提示模板适配到模型中,这至少带来三个优势:(1)注入常规表格相关描述的表示,以弥合表格数据与文本之间的拓扑差异;(2)使我们能够充分利用大量未标注的领域特定知识,从而缓解预训练语言模型固有的领域知识缺失问题;(3)允许我们设计多样化任务以探索领域特定知识。我们在三个开放域少样本自然语言生成数据集(Humans、Songs和Books)上进行了大量实验与分析。与先前的最先进方法相比,我们的模型在流畅性和准确性方面均取得了更优性能。