What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization - 专知论文

会员服务 ·

0

可辨认的 · 优化器 · 多样性 · 相关系数 · MoDELS ·

2023 年 5 月 12 日

What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization

翻译：校准集应具备哪些理想特征？——长文本科学摘要中相关因素的识别

Griffin Adams,Bichlien H Nguyen,Jake Smith,Yingce Xia,Shufang Xie,Anna Ostropolets,Budhaditya Deb,Yuan-Jyue Chen,Tristan Naumann,Noémie Elhadad

from arxiv, ACL 2023

Summarization models often generate text that is poorly calibrated to quality metrics because they are trained to maximize the likelihood of a single reference (MLE). To address this, recent work has added a calibration step, which exposes a model to its own ranked outputs to improve relevance or, in a separate line of work, contrasts positive and negative sets to improve faithfulness. While effective, much of this work has focused on how to generate and optimize these sets. Less is known about why one setup is more effective than another. In this work, we uncover the underlying characteristics of effective sets. For each training instance, we form a large, diverse pool of candidates and systematically vary the subsets used for calibration fine-tuning. Each selection strategy targets distinct aspects of the sets, such as lexical diversity or the size of the gap between positive and negatives. On three diverse scientific long-form summarization datasets (spanning biomedical, clinical, and chemical domains), we find, among others, that faithfulness calibration is optimal when the negative sets are extractive and more likely to be generated, whereas for relevance calibration, the metric margin between candidates should be maximized and surprise--the disagreement between model and metric defined candidate rankings--minimized. Code to create, select, and optimize calibration sets is available at https://github.com/griff4692/calibrating-summaries

翻译：摘要：摘要生成模型通常由于仅以最大化单个参考文本的似然（MLE）为目标进行训练，导致生成文本与质量指标的校准效果不佳。为解决这一问题，近期研究引入了校准步骤：一方面通过向模型展示其自身排序后的输出结果来提升相关性；另一方面，在另一独立研究线路中，通过对比正样本集与负样本集来增强忠实度。尽管上述方法有效，但现有工作多集中于如何生成与优化这些样本集，对不同设置间产生效果差异的根本原因却知之甚少。本研究旨在揭示有效样本集的底层特征。针对每个训练实例，我们构建一个大规模多样化的候选池，并系统地改变用于校准微调的子集。每种选择策略针对样本集的不同维度，例如词汇多样性或正负样本间的差距大小。在三个多样化的科学长文本摘要数据集（涵盖生物医学、临床医学和化学领域）中，我们发现：当负样本集具备抽取式特征且更易生成时，忠实度校准效果最优；而对于相关性校准，则需最大化候选样本间的指标差距，同时最小化“惊讶度”——即模型与指标定义下候选样本排名之间的分歧。创建、选择及优化校准集的代码已发布于 https://github.com/griff4692/calibrating-summaries

0

相关内容

可辨认的

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于同步轨道圆迹雷达的三维地表形变监测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

解析RELM-α调控中性粒细胞迁移机制对重症急性胰腺炎继发急性肺损伤的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯-介孔氧化物对有机污染物的吸附、催化及分析检测

国家自然科学基金

0+阅读 · 2012年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

mTOR激活对吗啡耐受的调控及其分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

复合污染物对磁性多孔微纳结构氧化物表面的协同吸附研究

国家自然科学基金

0+阅读 · 2009年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation

Arxiv

0+阅读 · 2023年6月28日

Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting

Arxiv

0+阅读 · 2023年6月28日

Refining the Adaptivity Notion in the Huge Object Model

Arxiv

0+阅读 · 2023年6月28日

You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting

Arxiv

0+阅读 · 2023年6月28日

An Adaptive Neighborhood Partition Full Conditional Mutual Information Maximization Method for Feature Selection

Arxiv

0+阅读 · 2023年6月27日

Evidential Calibration of Confidence Intervals

Arxiv

0+阅读 · 2023年6月27日

LViT: Language meets Vision Transformer in Medical Image Segmentation

Arxiv

0+阅读 · 2023年6月27日

DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

Arxiv

0+阅读 · 2023年6月24日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

2+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

3+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

11+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

7+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

6+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

4+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

10+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

6+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

9+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

11+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

6+阅读 · 7月25日

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

A Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation

Arxiv

0+阅读 · 2023年6月28日

Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting

Arxiv

0+阅读 · 2023年6月28日

Refining the Adaptivity Notion in the Huge Object Model

Arxiv

0+阅读 · 2023年6月28日

You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting

Arxiv

0+阅读 · 2023年6月28日

An Adaptive Neighborhood Partition Full Conditional Mutual Information Maximization Method for Feature Selection

Arxiv

0+阅读 · 2023年6月27日

Evidential Calibration of Confidence Intervals

Arxiv

0+阅读 · 2023年6月27日

LViT: Language meets Vision Transformer in Medical Image Segmentation

Arxiv

0+阅读 · 2023年6月27日

DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

Arxiv

0+阅读 · 2023年6月24日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于同步轨道圆迹雷达的三维地表形变监测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

解析RELM-α调控中性粒细胞迁移机制对重症急性胰腺炎继发急性肺损伤的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯-介孔氧化物对有机污染物的吸附、催化及分析检测

国家自然科学基金

0+阅读 · 2012年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

mTOR激活对吗啡耐受的调控及其分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

复合污染物对磁性多孔微纳结构氧化物表面的协同吸附研究

国家自然科学基金

0+阅读 · 2009年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员