Natural language generation from structured data mainly focuses on surface-level descriptions, suffering from uncontrollable content selection and low fidelity. Previous works leverage logical forms to facilitate logical knowledge-conditioned text generation. Though achieving remarkable progress, they are data-hungry, which makes the adoption for real-world applications challenging with limited data. To this end, this paper proposes a unified framework for logical knowledge-conditioned text generation in the few-shot setting. With only a few seeds logical forms (e.g., 20/100 shot), our approach leverages self-training and samples pseudo logical forms based on content and structure consistency. Experimental results demonstrate that our approach can obtain better few-shot performance than baselines.
翻译:摘要:从结构化数据生成自然语言主要关注表层描述,存在内容选择不可控且忠实度低的问题。以往工作借助逻辑形式促进逻辑知识条件文本生成,虽取得显著进展,但依赖大量数据,导致在有限数据场景下难以应用于实际任务。为此,本文提出面向少样本场景的统一逻辑知识条件文本生成框架。仅需少量种子逻辑形式(如20/100样本),该方法通过自训练,并依据内容与结构一致性采样伪逻辑形式。实验结果表明,本方法在少样本性能上优于基线模型。