基于内部知识评估大型语言模型的书籍摘要：一种跨模型与语义一致性方法 (Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach)

We study the ability of large language models (LLMs) to generate comprehensive and accurate book summaries solely from their internal knowledge, without recourse to the original text. Employing a diverse set of books and multiple LLM architectures, we examine whether these models can synthesize meaningful narratives that align with established human interpretations. Evaluation is performed with a LLM-as-a-judge paradigm: each AI-generated summary is compared against a high-quality, human-written summary via a cross-model assessment, where all participating LLMs evaluate not only their own outputs but also those produced by others. This methodology enables the identification of potential biases, such as the proclivity for models to favor their own summarization style over others. In addition, alignment between the human-crafted and LLM-generated summaries is quantified using ROUGE and BERTScore metrics, assessing the depth of grammatical and semantic correspondence. The results reveal nuanced variations in content representation and stylistic preferences among the models, highlighting both strengths and limitations inherent in relying on internal knowledge for summarization tasks. These findings contribute to a deeper understanding of LLM internal encodings of factual information and the dynamics of cross-model evaluation, with implications for the development of more robust natural language generative systems.

翻译：本研究探讨大型语言模型（LLMs）在不依赖原始文本的情况下，仅凭其内部知识生成全面且准确书籍摘要的能力。通过选取多样化的书籍样本并采用多种LLM架构，我们检验这些模型能否合成符合既定人类解读意义的有意义叙事。评估采用LLM作为评判者的范式：每个AI生成的摘要均与高质量人工撰写的摘要进行跨模型比对，所有参与的LLMs不仅评估自身输出，同时评估其他模型生成的摘要。该方法能够识别潜在偏差，例如模型倾向于偏好自身的摘要风格而非其他模型。此外，通过ROUGE和BERTScore指标量化人工撰写摘要与LLM生成摘要之间的对齐程度，从而评估语法与语义对应关系的深度。研究结果揭示了不同模型在内容呈现和风格偏好上的细微差异，凸显了依赖内部知识进行摘要任务时固有的优势与局限。这些发现有助于深化对LLM事实信息内部编码机制及跨模型评估动态的理解，对开发更稳健的自然语言生成系统具有重要启示。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日