We present ASPIRO, an approach for structured data verbalisation into short template sentences in zero to few-shot settings. Unlike previous methods, our approach prompts large language models (LLMs) to directly produce entity-agnostic templates, rather than relying on LLMs to faithfully copy the given example entities, or validating/crafting the templates manually. We incorporate LLM re-prompting, triggered by algorithmic parsing checks, as well as the PARENT metric induced consistency validation to identify and rectify template generation problems in real-time. ASPIRO, compared to direct LLM output, averages 66\% parsing error rate reduction in generated verbalisations of RDF triples on the DART dataset. Our best 5-shot text-davinci-003 setup, scoring BLEU of 50.62, METEOR of 45.16, BLEURT of 0.82, NUBIA of 0.87, and PARENT of 0.8962 on the Rel2Text dataset, competes effectively with recent fine-tuned pre-trained language models.
翻译:我们提出ASPIRO方法,用于在零样本到少样本场景下将结构化数据转化为短模板句子。与先前方法不同,本方法直接引导大语言模型生成实体无关的模板,而非依赖模型忠实复制给定实体示例或手动验证/构建模板。我们引入由算法解析检查触发的LLM重提示机制,并结合PARENT指标驱动的语义一致性验证,实时识别并修复模板生成问题。在DART数据集上,ASPIRO将RDF三元组生成文本的解析错误率平均降低66%(相较于直接LLM输出)。我们的最佳5-shot text-davinci-003配置在Rel2Text数据集上取得BLEU 50.62、METEOR 45.16、BLEURT 0.82、NUBIA 0.87及PARENT 0.8962的评分,与近期微调预训练语言模型表现相当。