How Many Demonstrations Do You Need for In-context Learning? - 专知论文

会员服务 ·

0

Performer · CoT · Learning · 有偏 · 语言模型化 ·

2023 年 4 月 24 日

How Many Demonstrations Do You Need for In-context Learning?

翻译：上下文学习需要多少示例？

Jiuhai Chen,Lichang Chen,Chen Zhu,Tianyi Zhou

Large language models (LLMs) are capable to perform complex reasoning by in-context learning (ICL) when provided with a few input-output demonstrations (demos) and more powerful when intermediate reasoning steps ("chain of thoughts (CoT)") of the demos are given. Is it necessary to use multi-demo in ICL? In this paper, we study ICL using fewer demos for each test query on the tasks in~\cite{wei2022chain}. Surprisingly, we do not observe significant degradation when using only one randomly chosen demo. To study this phenomenon, for each test query, we categorize demos into "correct demos" leading to the correct answer, and "wrong demos" resulting in wrong answers. Our analysis reveals an inherent bias in those widely studied datasets: most demos are correct for a majority of test queries, which explains the good performance of using one random demo. Moreover, ICL (with and w/o CoT) using only one correct demo significantly outperforms all-demo ICL adopted by most previous works, indicating the weakness of LLMs in finding correct demo(s) for input queries, which is difficult to evaluate on the biased datasets. Furthermore, we observe a counterintuitive behavior of ICL using multi-demo, i.e., its accuracy degrades(improves) when given more correct(wrong) demos. This implies that ICL can be easily misguided by interference among demos and their spurious correlations. Our analyses highlight several fundamental challenges that need to be addressed in LLMs training, ICL, and benchmark design.

翻译：大语言模型（LLMs）能够通过上下文学习（ICL）进行复杂推理，仅需提供少量输入-输出示例（demo），而当给出中间推理步骤（如"思维链（CoT）"）时推理能力更强。在ICL中是否必须使用多个示例？本文针对~\cite{wei2022chain}中的任务，研究每个测试查询使用较少示例的ICL。令人惊讶的是，当仅使用一个随机选择的示例时，我们并未观察到显著的性能下降。为探究此现象，我们对每个测试查询将示例分为两类：能得出正确答案的"正确示例"和导致错误答案的"错误示例"。分析揭示了这些广泛研究的数据集存在固有偏差：大多数示例对大多数测试查询都是正确的，这解释了使用单个随机示例性能良好的原因。此外，仅使用一个正确示例的ICL（含/不含CoT）显著优于大多数先前工作采用的全示例ICL，这表明LLMs在为输入查询寻找正确示例方面存在不足——而这种缺陷很难在存在偏差的数据集上评估。进一步地，我们观察到多示例ICL的反直觉行为：当提供更多正确（错误）示例时，其准确率反而下降（上升）。这表明ICL容易受到示例间干扰及其虚假相关性的误导。我们的分析揭示了LLMs训练、ICL及基准测试设计中亟需解决的若干根本性挑战。

0

相关内容

Performer

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

新疆酿酒葡萄栽培气候区的酵母菌多样性研究

国家自然科学基金

0+阅读 · 2015年12月31日

西藏阿里地区外流河流域古代聚落与建筑研究

国家自然科学基金

0+阅读 · 2014年12月31日

东方行军蚁取食紫茎泽兰的原因研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于湖泊沉积记录的我国西南地区大气重金属污染历史过程研究

国家自然科学基金

0+阅读 · 2012年12月31日

CKS1基因影响鼻咽癌细胞增殖和侵袭的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

西北主要城镇区域与PREE的动态模拟、空间整合研究

国家自然科学基金

0+阅读 · 2011年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

A20介导的RIP泛素化在肝癌细胞对TRAIL耐受中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

寒冷地区非点源氮磷迁移转化规律及流域环境模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

PFNs4BO: In-Context Learning for Bayesian Optimization

Arxiv

0+阅读 · 2023年6月9日

Out-of-Variable Generalization for Discriminative Models

Arxiv

0+阅读 · 2023年6月9日

How Do In-Context Examples Affect Compositional Generalization?

Arxiv

0+阅读 · 2023年6月9日

Connectional-Style-Guided Contextual Representation Learning for Brain Disease Diagnosis

Arxiv

0+阅读 · 2023年6月8日

Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning

Arxiv

0+阅读 · 2023年6月8日

Improving Long Context Document-Level Machine Translation

Arxiv

0+阅读 · 2023年6月8日

Robust online active learning

Arxiv

1+阅读 · 2023年6月8日

Continual Learning with Pretrained Backbones by Tuning in the Input Space

Arxiv

0+阅读 · 2023年6月8日

In-Context Learning through the Bayesian Prism

Arxiv

0+阅读 · 2023年6月8日

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Arxiv

0+阅读 · 2023年6月7日

VIP会员

文章信息

相关主题

语言模型化

最新内容

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

5+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

5+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

4+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

3+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

5+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

5+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

7+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

8+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

12+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

13+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

10+阅读 · 7月18日

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

印度精确打击与指挥架构的断层

美空军AI完成F-16战斗机自主空战历史性试飞

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

PFNs4BO: In-Context Learning for Bayesian Optimization

Arxiv

0+阅读 · 2023年6月9日

Out-of-Variable Generalization for Discriminative Models

Arxiv

0+阅读 · 2023年6月9日

How Do In-Context Examples Affect Compositional Generalization?

Arxiv

0+阅读 · 2023年6月9日

Connectional-Style-Guided Contextual Representation Learning for Brain Disease Diagnosis

Arxiv

0+阅读 · 2023年6月8日

Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning

Arxiv

0+阅读 · 2023年6月8日

Improving Long Context Document-Level Machine Translation

Arxiv

0+阅读 · 2023年6月8日

Robust online active learning

Arxiv

1+阅读 · 2023年6月8日

Continual Learning with Pretrained Backbones by Tuning in the Input Space

Arxiv

0+阅读 · 2023年6月8日

In-Context Learning through the Bayesian Prism

Arxiv

0+阅读 · 2023年6月8日

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Arxiv

0+阅读 · 2023年6月7日

相关基金

新疆酿酒葡萄栽培气候区的酵母菌多样性研究

国家自然科学基金

0+阅读 · 2015年12月31日

西藏阿里地区外流河流域古代聚落与建筑研究

国家自然科学基金

0+阅读 · 2014年12月31日

东方行军蚁取食紫茎泽兰的原因研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于湖泊沉积记录的我国西南地区大气重金属污染历史过程研究

国家自然科学基金

0+阅读 · 2012年12月31日

CKS1基因影响鼻咽癌细胞增殖和侵袭的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

西北主要城镇区域与PREE的动态模拟、空间整合研究

国家自然科学基金

0+阅读 · 2011年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

A20介导的RIP泛素化在肝癌细胞对TRAIL耐受中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

寒冷地区非点源氮磷迁移转化规律及流域环境模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员