Leveraging Large Language Models for Multiple Choice Question Answering - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 自动问答 · 可约的 · SOTA ·

2023 年 3 月 17 日

Leveraging Large Language Models for Multiple Choice Question Answering

翻译：利用大语言模型进行多项选择问答

Joshua Robinson,Christopher Michael Rytting,David Wingate

from arxiv, Accepted for ICLR 2023

While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective, the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated.

翻译：尽管GPT-3等大语言模型在零样本、单样本和少样本场景下的多项选择问答任务中取得了显著成果，但它们普遍落后于多项选择问答领域的技术水平。传统上，多项选择问答任务被呈现给大语言模型的方式类似于完形填空任务：大语言模型基于一个问题（不含相关选项）进行条件生成，其选中的选项为经过标准化（如长度）处理后概率最高的那个。一种更自然的提示方法是联合向大语言模型呈现问题与答案选项，并让其输出与所选答案选项对应的符号（例如“A”）。这种方法允许模型显式比较各选项，降低计算成本，并减轻分词方案及答案选项表示对答案选择的影响。为使自然方法有效，所用的大语言模型必须能够将答案选项与其对应的符号关联起来，即模型需要具备我们称之为多项选择符号绑定（MCSB）的能力。这种能力在不同模型中差异显著。我们证明，具备高MCSB能力的模型在20个多样化数据集中采用自然方法的表现远优于传统方法，并基本拉平了与技术水平的差距，这表明大语言模型在多项选择问答中的能力此前被低估了。

0

相关内容

语言模型化

语言模型化

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

33+阅读 · 2021年10月9日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

ICLR 2023 | PromptPG：当强化学习遇见大规模语言模型

ICLR 2023 | PromptPG：当强化学习遇见大规模语言模型

机器之心

2+阅读 · 2023年4月6日

论文浅尝 | Language Models (Mostly) Know What They Know

论文浅尝 | Language Models (Mostly) Know What They Know

开放知识图谱

2+阅读 · 2022年11月18日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

运力过剩下的航运服务网络设计问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于协同计算的社区问答意见型问题分析与答案生成研究

国家自然科学基金

0+阅读 · 2013年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

物联网中跨网密钥管理研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于信道Time/Power度量指标的TOA测距误差模型及其应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

胶质细胞CBR2激活在电针预处理诱导延迟相脑缺血耐受中作用

国家自然科学基金

0+阅读 · 2011年12月31日

物联网轻量级健壮安全中的关键问题研究

国家自然科学基金

1+阅读 · 2011年12月31日

活性氧在心肌缺血再灌注损伤保护作用中的新机理

国家自然科学基金

0+阅读 · 2011年12月31日

针刺对脑缺血损伤和保护机制相关信号蛋白磷酸化作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Large Language Model Programs

Arxiv

0+阅读 · 2023年5月9日

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Arxiv

0+阅读 · 2023年5月7日

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Arxiv

0+阅读 · 2023年5月7日

Query Expansion by Prompting Large Language Models

Arxiv

0+阅读 · 2023年5月5日

Progressive-Hint Prompting Improves Reasoning in Large Language Models

Arxiv

1+阅读 · 2023年5月5日

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

Arxiv

0+阅读 · 2023年5月4日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

13+阅读 · 2018年6月1日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

语言模型化

最新内容

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

1+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

1+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

6+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

5+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

5+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

6+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

5+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

8+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

13+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

9+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

7+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

5+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

12+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

7+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

10+阅读 · 7月26日

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

33+阅读 · 2021年10月9日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

热门VIP内容

开通专知VIP会员享更多权益服务

博士论文 | 从算法到基础模型：强化学习的统一视角

《异构人类团队的协作决策过程混合建模研究》

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

面向国防作战的最佳自主与蜂群无人机技术

相关资讯

ICLR 2023 | PromptPG：当强化学习遇见大规模语言模型

ICLR 2023 | PromptPG：当强化学习遇见大规模语言模型

机器之心

2+阅读 · 2023年4月6日

论文浅尝 | Language Models (Mostly) Know What They Know

论文浅尝 | Language Models (Mostly) Know What They Know

开放知识图谱

2+阅读 · 2022年11月18日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

相关论文

Large Language Model Programs

Arxiv

0+阅读 · 2023年5月9日

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Arxiv

0+阅读 · 2023年5月7日

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Arxiv

0+阅读 · 2023年5月7日

Query Expansion by Prompting Large Language Models

Arxiv

0+阅读 · 2023年5月5日

Progressive-Hint Prompting Improves Reasoning in Large Language Models

Arxiv

1+阅读 · 2023年5月5日

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

Arxiv

0+阅读 · 2023年5月4日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

13+阅读 · 2018年6月1日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

运力过剩下的航运服务网络设计问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于协同计算的社区问答意见型问题分析与答案生成研究

国家自然科学基金

0+阅读 · 2013年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

物联网中跨网密钥管理研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于信道Time/Power度量指标的TOA测距误差模型及其应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

胶质细胞CBR2激活在电针预处理诱导延迟相脑缺血耐受中作用

国家自然科学基金

0+阅读 · 2011年12月31日

物联网轻量级健壮安全中的关键问题研究

国家自然科学基金

1+阅读 · 2011年12月31日

活性氧在心肌缺血再灌注损伤保护作用中的新机理

国家自然科学基金

0+阅读 · 2011年12月31日

针刺对脑缺血损伤和保护机制相关信号蛋白磷酸化作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员