STOAT: Structured Data to Analytical Text With Controls

Recent language models have made tremendous progress in the structured data to text generation task. However, these models still give sub-optimal performance where logical inference is required to generate the descriptions. In this work, we specifically focus on analytical text generation from structured data such as tables. Building on the taxonomy proposed in (Gupta et al., 2020) we focus on controllable table to text generation for the following reasoning categories: numerical reasoning, commonsense reasoning, temporal reasoning, table knowledge, and entity knowledge. We propose STOAT model, which is table and reasoning aware, with vector-quantization to infuse the given reasoning categories in the output. We observe that our model provides 10.19%, 1.13% improvement on the PARENT metric in iToTTo and Infotabs for the analytical sentence task. We also found that our model generates 15.3% more faithful and analytical descriptions as compared to the baseline models in human evaluation. We curate and release two reasoning category annotated table-to-interesting text generation datasets based on the ToTTo (Parikh et al., 2020) and InfoTabs datasets (Gupta et al.,2020).

翻译：近期语言模型在结构化数据到文本生成任务上取得了显著进展，但在需要逻辑推理生成描述的场景中仍表现欠佳。本研究聚焦于从表格等结构化数据生成分析文本。基于Gupta等人（2020）提出的分类体系，我们针对以下推理类别展开可控表格到文本生成研究：数值推理、常识推理、时态推理、表格知识及实体知识。我们提出STOAT模型，该模型具备表格感知与推理感知能力，并通过向量量化技术将指定推理类别融入输出结果。实验表明，在iToTTo和InfoTabs数据集的分析语句任务中，我们的模型在PARENT指标上分别获得10.19%和1.13%的提升。人工评估显示，相较基线模型，我们的模型生成的描述在忠实度和分析性上提升15.3%。我们基于ToTTo（Parikh等人，2020）和InfoTabs（Gupta等人，2020）数据集整理并发布了两个带推理类别标注的表格到趣味文本生成数据集。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日