Large Language Models (LLMs) have shown promising performance in summary evaluation tasks, yet they face challenges such as high computational costs and the Lost-in-the-Middle problem where important information in the middle of long documents is often overlooked. To address these issues, this paper introduces a novel approach, Extract-then-Evaluate, which involves extracting key sentences from a long source document and then evaluating the summary by prompting LLMs. The results reveal that the proposed method not only significantly reduces evaluation costs but also exhibits a higher correlation with human evaluations. Furthermore, we provide practical recommendations for optimal document length and sentence extraction methods, contributing to the development of cost-effective yet more accurate methods for LLM-based text generation evaluation.
翻译:大语言模型在摘要评估任务中展现出良好性能,但仍面临计算成本高昂及"中间信息丢失"问题——长文档中间部分的重要信息常被忽略。针对这些问题,本文提出"先提取后评估"(Extract-then-Evaluate)方法,通过从长源文档中提取关键句子,再引导大语言模型进行摘要评估。实验结果表明,该方法不仅显著降低评估成本,而且与人类评估的相关性更高。此外,本文为文档长度优化及句子提取方法提供了实用建议,推动了大语言模型文本生成评估中兼顾成本效益与准确性的方法发展。