Modern natural language generation systems with Large Language Models (LLMs) exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if they truly possess the capability of information consolidation to generate summaries, especially on documents with opinionated information. We focus on meta-review generation, a form of sentiment summarisation for the scientific domain. To make scientific sentiment summarization more grounded, we hypothesize that human meta-reviewers follow a three-layer framework of sentiment consolidation to write meta-reviews. Based on the framework, we propose novel prompting methods for LLMs to generate meta-reviews and evaluation metrics to assess the quality of generated meta-reviews. Our framework is validated empirically as we find that prompting LLMs based on the framework -- compared with prompting them with simple instructions -- generates better meta-reviews.
翻译:现代基于大型语言模型(LLM)的自然语言生成系统已展现出为多篇文档生成合理摘要的能力;然而,这些系统是否真正具备信息整合能力以生成摘要,尤其是在处理包含观点性信息的文档时,仍存在不确定性。本文聚焦于元评论文本生成——一种面向科学领域的情感摘要任务。为使科学情感摘要更具依据性,我们提出假设:人类元评审者在撰写元评审意见时遵循一个三层情感整合框架。基于该框架,我们提出了新颖的提示方法指导LLM生成元评审意见,并设计了相应的评估指标来衡量生成文本的质量。通过实证验证,我们发现基于该框架设计的提示方法——相较于简单指令提示——能够生成质量更高的元评审意见。