Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering - 专知论文

会员服务 ·

0

Learning · 有偏 · Prompt · Engineering · Performer ·

2024 年 12 月 1 日

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

翻译：批量校准：重新思考上下文学习与提示工程中的校准方法

Han Zhou,Xingchen Wan,Lev Proleev,Diana Mincu,Jilin Chen,Katherine Heller,Subhrajit Roy

from arxiv, ICLR 2024

Prompting and in-context learning (ICL) have become efficient learning paradigms for large language models (LLMs). However, LLMs suffer from prompt brittleness and various bias factors in the prompt, including but not limited to the formatting, the choice verbalizers, and the ICL examples. To address this problem that results in unexpected performance degradation, calibration methods have been developed to mitigate the effects of these biases while recovering LLM performance. In this work, we first conduct a systematic analysis of the existing calibration methods, where we both provide a unified view and reveal the failure cases. Inspired by these analyses, we propose Batch Calibration (BC), a simple yet intuitive method that controls the contextual bias from the batched input, unifies various prior approaches, and effectively addresses the aforementioned issues. BC is zero-shot, inference-only, and incurs negligible additional costs. In the few-shot setup, we further extend BC to allow it to learn the contextual bias from labeled data. We validate the effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.

翻译：提示学习与上下文学习已成为大型语言模型的高效学习范式。然而，大型语言模型存在提示脆弱性问题，且提示中可能包含多种偏差因素（包括但不限于格式设计、词汇选择器及上下文学习示例）。为应对由此导致的性能意外下降问题，校准方法应运而生，旨在消除这些偏差影响并恢复模型性能。本研究首先对现有校准方法进行系统性分析，通过统一框架揭示其失效案例。基于这些分析，我们提出批量校准——一种通过批输入控制上下文偏差的直观方法。该方法统一了多种现有方案，能有效解决前述问题，且具备零样本、纯推理、附加成本可忽略等特性。在少样本场景中，我们进一步扩展批量校准框架，使其能够从标注数据中学习上下文偏差。通过在PaLM 2-(S, M, L)和CLIP模型上的实验验证，批量校准在超过10项自然语言理解与图像分类任务中均超越现有校准基线，达到最先进的性能水平。

0

相关内容

Learning

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Arxiv

11+阅读 · 2024年1月16日

Explainability for Large Language Models: A Survey

Arxiv

18+阅读 · 2023年9月2日

Introspective Tips: Large Language Model for In-Context Decision Making

Arxiv

12+阅读 · 2023年5月19日

Meta Learning for Natural Language Processing: A Survey

Meta Learning for Natural Language Processing: A Survey

Arxiv

15+阅读 · 2022年5月3日

NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

Arxiv

15+阅读 · 2022年3月3日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Arxiv

15+阅读 · 2018年1月25日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

最新内容

美国当前高超音速导弹发展概述

美国当前高超音速导弹发展概述

专知会员服务

4+阅读 · 4月19日

《高超音速武器：一项再度兴起的技术》120页slides

《高超音速武器：一项再度兴起的技术》120页slides

专知会员服务

8+阅读 · 4月19日

无人机蜂群建模与仿真方法

无人机蜂群建模与仿真方法

专知会员服务

6+阅读 · 4月19日

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

专知会员服务

3+阅读 · 4月19日

《量化反无人机系统对抗无人机蜂群效能的创新方法》

《量化反无人机系统对抗无人机蜂群效能的创新方法》

专知会员服务

6+阅读 · 4月19日

澳大利亚发布《国防战略（2026年）》

澳大利亚发布《国防战略（2026年）》

专知会员服务

2+阅读 · 4月19日

【CMU博士论文】迈向基于基础先验的 4D 感知研究

【CMU博士论文】迈向基于基础先验的 4D 感知研究

专知会员服务

2+阅读 · 4月19日

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

专知会员服务

3+阅读 · 4月19日

全球高超音速武器最新发展趋势

全球高超音速武器最新发展趋势

专知会员服务

2+阅读 · 4月19日

《利用大语言模型增强多域作战兵棋推演》（报告）

《利用大语言模型增强多域作战兵棋推演》（报告）

专知会员服务

12+阅读 · 4月18日

《增强准备状态与战备水平：态势感知与数据驱动决策》报告

《增强准备状态与战备水平：态势感知与数据驱动决策》报告

专知会员服务

11+阅读 · 4月18日

中文版《可靠定位、导航与授时 (APNT)：美军相关研发项目》报告

中文版《可靠定位、导航与授时 (APNT)：美军相关研发项目》报告

专知会员服务

9+阅读 · 4月18日

《自主武器系统人类-AI指挥控制中的动态管理》（2026最新450页）

《自主武器系统人类-AI指挥控制中的动态管理》（2026最新450页）

专知会员服务

17+阅读 · 4月18日

美智库《实现空军战斗出动架次生成能力：对目标、差距、障碍与解决方案的审视》（报告）

美智库《实现空军战斗出动架次生成能力：对目标、差距、障碍与解决方案的审视》（报告）

专知会员服务

8+阅读 · 4月18日

《大规模作战行动中争夺情报优势：情报与电子战营-下一代角色探析》（报告）

《大规模作战行动中争夺情报优势：情报与电子战营-下一代角色探析》（报告）

专知会员服务

10+阅读 · 4月18日

相关VIP内容

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《高超音速武器：一项再度兴起的技术》120页slides

《重建美国空中力量：为应对同级冲突平衡空军战斗力量》美智库报告

美国当前高超音速导弹发展概述

无人机蜂群建模与仿真方法

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Arxiv

11+阅读 · 2024年1月16日

Explainability for Large Language Models: A Survey

Arxiv

18+阅读 · 2023年9月2日

Introspective Tips: Large Language Model for In-Context Decision Making

Arxiv

12+阅读 · 2023年5月19日

Meta Learning for Natural Language Processing: A Survey

Meta Learning for Natural Language Processing: A Survey

Arxiv

15+阅读 · 2022年5月3日

NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

Arxiv

15+阅读 · 2022年3月3日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Arxiv

15+阅读 · 2018年1月25日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员