《无需冗长阅读：评估LLM生成的极简科学摘要》 (Not too long do read: Evaluating LLM-generated extreme scientific summaries) - 专知论文

会员服务 ·

0

数据集 · 大语言模型 · 包含 · 论文摘要 · 样本 ·

2025 年 12 月 29 日

Not too long do read: Evaluating LLM-generated extreme scientific summaries

翻译：《无需冗长阅读：评估LLM生成的极简科学摘要》

Zhuoqi Lyu,Qing Ke

High-quality scientific extreme summary (TLDR) facilitates effective science communication. How do large language models (LLMs) perform in generating them? How are LLM-generated summaries different from those written by human experts? However, the lack of a comprehensive, high-quality scientific TLDR dataset hinders both the development and evaluation of LLMs' summarization ability. To address these, we propose a novel dataset, BiomedTLDR, containing a large sample of researcher-authored summaries from scientific papers, which leverages the common practice of including authors' comments alongside bibliography items. We then test popular open-weight LLMs for generating TLDRs based on abstracts. Our analysis reveals that, although some of them successfully produce humanoid summaries, LLMs generally exhibit a greater affinity for the original text's lexical choices and rhetorical structures, hence tend to be more extractive rather than abstractive in general, compared to humans. Our code and datasets are available at https://github.com/netknowledge/LLM_summarization (Lyu and Ke, 2025).

翻译：高质量的极简科学摘要（TLDR）有助于促进有效的科学传播。大型语言模型（LLM）在生成此类摘要方面表现如何？LLM生成的摘要与人类专家撰写的摘要有何差异？然而，缺乏全面、高质量的科学TLDR数据集阻碍了LLM摘要能力的开发与评估。为解决这些问题，我们提出了一个新颖的数据集BiomedTLDR，其中包含大量由研究者撰写的科学论文摘要样本，该数据集利用了在文献条目旁附上作者评论的常见做法。随后，我们测试了基于摘要生成TLDR的流行开源权重LLM。分析表明，尽管部分模型能成功生成类人摘要，但与人类相比，LLM总体上对原文词汇选择和修辞结构表现出更强的亲和性，因此往往更具抽取性而非概括性。我们的代码和数据集可在https://github.com/netknowledge/LLM_summarization获取（Lyu与Ke，2025）。

0

相关内容

数据集

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

LLM4SR：关于大规模语言模型在科学研究中的应用综述

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

42+阅读 · 2025年1月9日

DARPA "少标签学习 "项目《利用任务和领域结构从小型标签集学习》2023最新报告

DARPA "少标签学习 "项目《利用任务和领域结构从小型标签集学习》2023最新报告

专知会员服务

55+阅读 · 2023年12月6日

《增强弹性：基于模型的仿真》2022最新14页技术报告，北约科学与技术组织（STO）

《增强弹性：基于模型的仿真》2022最新14页技术报告，北约科学与技术组织（STO）

专知会员服务

37+阅读 · 2022年10月24日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

93+阅读 · 2020年7月10日

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

专知会员服务

14+阅读 · 2020年3月27日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【机器伦理学综述论文，37页pdf】Implementations in Machine Ethics: A Survey

专知会员服务

13+阅读 · 2020年1月23日

【Svitlana博士论文以及答辩slides】基于知识的对话搜索（Knowledge-based Conversational Search），附145页pdf论文，55页ppt

【Svitlana博士论文以及答辩slides】基于知识的对话搜索（Knowledge-based Conversational Search），附145页pdf论文，55页ppt

专知会员服务

48+阅读 · 2019年11月25日

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

专知会员服务

35+阅读 · 2019年11月9日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

54+阅读 · 2022年6月2日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

专知

33+阅读 · 2020年8月24日

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

专知

58+阅读 · 2020年7月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

《变分自编码器（VAE）导论》93页书册，附PDF下载

《变分自编码器（VAE）导论》93页书册，附PDF下载

专知

60+阅读 · 2019年6月14日

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

专知

105+阅读 · 2019年4月15日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文笔记之attention mechanism专题1:SA-Net（CVPR 2018）

论文笔记之attention mechanism专题1:SA-Net（CVPR 2018）

统计学习与视觉计算组

16+阅读 · 2018年4月5日

Mask R-CNN 论文笔记

Mask R-CNN 论文笔记

统计学习与视觉计算组

11+阅读 · 2018年3月22日

多视角识别长非编码RNA和人类复杂疾病关联预测研究

国家自然科学基金

4+阅读 · 2017年12月31日

面向快速油藏历史拟合的粒子群算法研究

国家自然科学基金

4+阅读 · 2015年12月31日

反问题的数学建模、计算及应用

国家自然科学基金

4+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于代数规约的Web服务在线测试理论和技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向DS证据理论的关联信息融合研究

国家自然科学基金

4+阅读 · 2015年12月31日

面向学术资源的TSD与TDC测度及分析研究

国家自然科学基金

1+阅读 · 2015年12月31日

中英文论文中的中国作者姓名消歧研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于第三方的APP软件质量度量和评估方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator

Arxiv

0+阅读 · 1月29日

PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks

Arxiv

0+阅读 · 1月27日

LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics

Arxiv

0+阅读 · 1月26日

RubberDuckBench: A Benchmark for AI Coding Assistants

Arxiv

0+阅读 · 1月23日

LLM or Human? Perceptions of Trust and Information Quality in Research Summaries

Arxiv

0+阅读 · 1月22日

When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling

Arxiv

0+阅读 · 1月21日

LLM Reasoning for Cold-Start Item Recommendation

Arxiv

0+阅读 · 1月21日

Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications

Arxiv

0+阅读 · 1月20日

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

Arxiv

0+阅读 · 1月20日

Lessons from the Field: An Adaptable Lifecycle Approach to Applied Dialogue Summarization

Arxiv

0+阅读 · 1月13日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

LLM4SR：关于大规模语言模型在科学研究中的应用综述

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

42+阅读 · 2025年1月9日

DARPA "少标签学习 "项目《利用任务和领域结构从小型标签集学习》2023最新报告

DARPA "少标签学习 "项目《利用任务和领域结构从小型标签集学习》2023最新报告

专知会员服务

55+阅读 · 2023年12月6日

《增强弹性：基于模型的仿真》2022最新14页技术报告，北约科学与技术组织（STO）

《增强弹性：基于模型的仿真》2022最新14页技术报告，北约科学与技术组织（STO）

专知会员服务

37+阅读 · 2022年10月24日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

93+阅读 · 2020年7月10日

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

专知会员服务

14+阅读 · 2020年3月27日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【机器伦理学综述论文，37页pdf】Implementations in Machine Ethics: A Survey

专知会员服务

13+阅读 · 2020年1月23日

【Svitlana博士论文以及答辩slides】基于知识的对话搜索（Knowledge-based Conversational Search），附145页pdf论文，55页ppt

【Svitlana博士论文以及答辩slides】基于知识的对话搜索（Knowledge-based Conversational Search），附145页pdf论文，55页ppt

专知会员服务

48+阅读 · 2019年11月25日

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

专知会员服务

35+阅读 · 2019年11月9日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基于自适应表征的高效视觉建模

《多域作战中融合网络、电子战与动能机动》

AI智能体时代大模型安全风险与攻防新挑战

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

相关资讯

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

54+阅读 · 2022年6月2日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

专知

33+阅读 · 2020年8月24日

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

专知

58+阅读 · 2020年7月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

《变分自编码器（VAE）导论》93页书册，附PDF下载

《变分自编码器（VAE）导论》93页书册，附PDF下载

专知

60+阅读 · 2019年6月14日

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

IBM-小样本学习（Few-shot Learning）State of the art 方法及论文讲解

专知

105+阅读 · 2019年4月15日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文笔记之attention mechanism专题1:SA-Net（CVPR 2018）

论文笔记之attention mechanism专题1:SA-Net（CVPR 2018）

统计学习与视觉计算组

16+阅读 · 2018年4月5日

Mask R-CNN 论文笔记

Mask R-CNN 论文笔记

统计学习与视觉计算组

11+阅读 · 2018年3月22日

相关论文

Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator

Arxiv

0+阅读 · 1月29日

PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks

Arxiv

0+阅读 · 1月27日

LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics

Arxiv

0+阅读 · 1月26日

RubberDuckBench: A Benchmark for AI Coding Assistants

Arxiv

0+阅读 · 1月23日

LLM or Human? Perceptions of Trust and Information Quality in Research Summaries

Arxiv

0+阅读 · 1月22日

When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling

Arxiv

0+阅读 · 1月21日

LLM Reasoning for Cold-Start Item Recommendation

Arxiv

0+阅读 · 1月21日

Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications

Arxiv

0+阅读 · 1月20日

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

Arxiv

0+阅读 · 1月20日

Lessons from the Field: An Adaptable Lifecycle Approach to Applied Dialogue Summarization

Arxiv

0+阅读 · 1月13日

相关基金

多视角识别长非编码RNA和人类复杂疾病关联预测研究

国家自然科学基金

4+阅读 · 2017年12月31日

面向快速油藏历史拟合的粒子群算法研究

国家自然科学基金

4+阅读 · 2015年12月31日

反问题的数学建模、计算及应用

国家自然科学基金

4+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于代数规约的Web服务在线测试理论和技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向DS证据理论的关联信息融合研究

国家自然科学基金

4+阅读 · 2015年12月31日

面向学术资源的TSD与TDC测度及分析研究

国家自然科学基金

1+阅读 · 2015年12月31日

中英文论文中的中国作者姓名消歧研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于第三方的APP软件质量度量和评估方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员