Prompting GPT-3 To Be Reliable - 专知论文

会员服务 ·

0

GPT-3 · Prompt · MoDELS · 可约的 · 监督模型 ·

2023 年 2 月 15 日

Prompting GPT-3 To Be Reliable

翻译：引导GPT-3实现可靠表现

Chenglei Si,Zhe Gan,Zhengyuan Yang,Shuohang Wang,Jianfeng Wang,Jordan Boyd-Graber,Lijuan Wang

from arxiv, ICLR 2023

Large language models (LLMs) show impressive abilities via few-shot prompting. Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world language applications. However, the crucial problem of how to improve the reliability of GPT-3 is still under-explored. While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality. Our core contribution is to establish simple and effective prompts that improve GPT-3's reliability as it: 1) generalizes out-of-distribution, 2) balances demographic distribution and uses natural language instructions to reduce social biases, 3) calibrates output probabilities, and 4) updates the LLM's factual knowledge and reasoning chains. With appropriate prompts, GPT-3 is more reliable than smaller-scale supervised models on all these facets. We release all processed datasets, evaluation scripts, and model predictions. Our systematic empirical study not only sheds new insights on the reliability of prompting LLMs, but more importantly, our prompting strategies can help practitioners more reliably use LLMs like GPT-3.

翻译：大型语言模型通过少样本提示展现出令人瞩目的能力。诸如OpenAI的GPT-3等商业化应用程序接口进一步提升了其在真实世界语言应用中的使用频率。然而，如何提升GPT-3可靠性的关键问题仍未得到充分探索。尽管"可靠性"是一个宽泛且定义模糊的术语，但我们将其分解为与现有机器学习安全框架相对应的四个主要维度，这些维度被公认至关重要：泛化能力、社会偏见、校准性能与事实准确性。我们的核心贡献在于建立简洁有效的提示方法，使GPT-3能够：1) 实现分布外泛化，2) 平衡人口统计学分布并运用自然语言指令减少社会偏见，3) 校准输出概率，4) 更新语言模型的事实知识与推理链。通过恰当的提示，GPT-3在这四个维度上均比小规模监督模型表现得更可靠。我们公开了所有处理后的数据集、评估脚本及模型预测结果。本系统性实证研究不仅揭示了提示语言模型可靠性的新见解，更重要的是，我们的提示策略能帮助实践者更可靠地使用GPT-3这类大型语言模型。

0

相关内容

GPT-3

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

128+阅读 · 2019年12月13日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

miR-125b调控奶牛乳腺炎的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

组蛋白修饰H3K36me3调控肿瘤细胞DNA损伤信号通路机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-136在去势小鼠颌骨骨细胞雌激素信号响应过程中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

ACSL3调控雄激素受体相关信号通路抑制前列腺癌进展转移研究

国家自然科学基金

0+阅读 · 2013年12月31日

前列腺癌干细胞雄激素受体甲基化在前列腺癌进展中分子机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

双重介导脂质体脑靶向给药系统的研究

国家自然科学基金

0+阅读 · 2012年12月31日

雄激素受体相关miRNA对前列腺癌生物学特性的调控作用

国家自然科学基金

0+阅读 · 2009年12月31日

紫/蓝光激发的稀土硅氧氮化物荧光材料

国家自然科学基金

0+阅读 · 2009年12月31日

UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner

Arxiv

0+阅读 · 2023年4月7日

Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning

Arxiv

0+阅读 · 2023年4月6日

Handling Wikidata Qualifiers in Reasoning

Arxiv

0+阅读 · 2023年4月6日

When do you need Chain-of-Thought Prompting for ChatGPT?

When do you need Chain-of-Thought Prompting for ChatGPT?

Arxiv

1+阅读 · 2023年4月6日

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Arxiv

0+阅读 · 2023年4月5日

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

Arxiv

0+阅读 · 2023年4月5日

To ChatGPT, or not to ChatGPT: That is the question!

Arxiv

2+阅读 · 2023年4月5日

LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

Arxiv

0+阅读 · 2023年4月2日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

VIP会员

文章信息

相关主题

最新内容

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

7+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

3+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

1+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

4+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

8+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

9+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

综述 | 遥感多模态大模型：领域专用还是通用模型？

综述 | 遥感多模态大模型：领域专用还是通用模型？

专知会员服务

5+阅读 · 7月25日

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

专知会员服务

5+阅读 · 7月25日

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

128+阅读 · 2019年12月13日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

美空军新型反无人机部队初探

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

相关论文

UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner

Arxiv

0+阅读 · 2023年4月7日

Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning

Arxiv

0+阅读 · 2023年4月6日

Handling Wikidata Qualifiers in Reasoning

Arxiv

0+阅读 · 2023年4月6日

When do you need Chain-of-Thought Prompting for ChatGPT?

When do you need Chain-of-Thought Prompting for ChatGPT?

Arxiv

1+阅读 · 2023年4月6日

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Arxiv

0+阅读 · 2023年4月5日

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

Arxiv

0+阅读 · 2023年4月5日

To ChatGPT, or not to ChatGPT: That is the question!

Arxiv

2+阅读 · 2023年4月5日

LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

Arxiv

0+阅读 · 2023年4月2日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

相关基金

miR-125b调控奶牛乳腺炎的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

组蛋白修饰H3K36me3调控肿瘤细胞DNA损伤信号通路机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-136在去势小鼠颌骨骨细胞雌激素信号响应过程中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

ACSL3调控雄激素受体相关信号通路抑制前列腺癌进展转移研究

国家自然科学基金

0+阅读 · 2013年12月31日

前列腺癌干细胞雄激素受体甲基化在前列腺癌进展中分子机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

双重介导脂质体脑靶向给药系统的研究

国家自然科学基金

0+阅读 · 2012年12月31日

雄激素受体相关miRNA对前列腺癌生物学特性的调控作用

国家自然科学基金

0+阅读 · 2009年12月31日

紫/蓝光激发的稀土硅氧氮化物荧光材料

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员