Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation - 专知论文

会员服务 ·

0

语言模型化 · 香农 · INFORMS · MoDELS · Extensibility ·

2021 年 3 月 19 日

Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation

翻译：与语言模式一起玩香农游戏:对简要评价采取无人问

Nicholas Egan,Oleg Vasilyev,John Bohannon

The goal of a summary is to concisely state the most important information in a document. With this principle in mind, we introduce new reference-free summary evaluation metrics that use a pretrained language model to estimate the information shared between a document and its summary. These metrics are a modern take on the Shannon Game, a method for summary quality scoring proposed decades ago, where we replace human annotators with language models. We also view these metrics as an extension of BLANC, a recently proposed approach to summary quality measurement based on the performance of a language model with and without the help of a summary. Using GPT-2, we empirically verify that the introduced metrics correlate with human judgement based on coverage, overall quality, and five summary dimensions.

翻译：摘要的目的是在文件中简明扼要地说明最重要的信息。考虑到这一原则,我们采用新的不附带参考的简要评价指标,使用预先培训的语言模型来估计文件及其摘要之间共享的信息。这些指标是对“香农游戏”的现代分析,这是几十年前提出的一个总结质量评分方法,我们用语言模型取代人类评分器。我们还将这些指标视为BLANC的延伸。BLANC是最近提出的一种根据语言模型的性能和不借助摘要衡量质量的汇总方法。我们利用GPT-2, 实证地证明引入的指标与基于覆盖面、总体质量和五个概要层面的人类判断相关。

0

相关内容

语言模型化

语言模型化

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

58+阅读 · 2020年8月28日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

专知会员服务

133+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

3+阅读 · 2019年1月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Towards Human-Free Automatic Quality Evaluation of German Summarization

Arxiv

0+阅读 · 2021年5月13日

Playing Codenames with Language Graphs and Word Embeddings

Arxiv

0+阅读 · 2021年5月12日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Arxiv

3+阅读 · 2019年9月3日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

75+阅读 · 2019年8月14日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Deep learning evaluation using deep linguistic processing

Arxiv

3+阅读 · 2018年5月12日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Arxiv

5+阅读 · 2017年12月12日

VIP会员

文章信息

相关主题

语言模型化

最新内容

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

1+阅读 · 今天16:54

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

1+阅读 · 今天16:52

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

6+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

5+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

6+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

相关VIP内容

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

58+阅读 · 2020年8月28日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

清华刘洋与邓力合著338页新书《Deep Learning in Natural Language Processing》

专知会员服务

133+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

Agentic RL：框架、实践与长程智能体训练

重新思考无人机时代的生存能力

综述 | 从问答到任务完成：Agent系统与Harness设计

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

3+阅读 · 2019年1月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Towards Human-Free Automatic Quality Evaluation of German Summarization

Arxiv

0+阅读 · 2021年5月13日

Playing Codenames with Language Graphs and Word Embeddings

Arxiv

0+阅读 · 2021年5月12日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Arxiv

3+阅读 · 2019年9月3日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

75+阅读 · 2019年8月14日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Deep learning evaluation using deep linguistic processing

Arxiv

3+阅读 · 2018年5月12日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Arxiv

5+阅读 · 2017年12月12日

微信扫码咨询专知VIP会员