Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models - 专知论文

会员服务 ·

0

语言模型化 · 置信度 · 黑盒 · MoDELS · 预测器/决策函数 ·

2023 年 5 月 30 日

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

翻译：自信生成：黑箱大语言模型的不确定性量化

Zhen Lin,Shubhendu Trivedi,Jimeng Sun

Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities across a variety of domains. However, gauging the trustworthiness of responses generated by LLMs remains an open challenge, with limited research on uncertainty quantification for NLG. Furthermore, existing literature typically assumes white-box access to language models, which is becoming unrealistic either due to the closed-source nature of the latest LLMs or due to computational constraints. In this work, we investigate uncertainty quantification in NLG for $\textit{black-box}$ LLMs. We first differentiate two closely-related notions: $\textit{uncertainty}$, which depends only on the input, and $\textit{confidence}$, which additionally depends on the generated response. We then propose and compare several confidence/uncertainty metrics, applying them to $\textit{selective NLG}$, where unreliable results could either be ignored or yielded for further assessment. Our findings on several popular LLMs and datasets reveal that a simple yet effective metric for the average semantic dispersion can be a reliable predictor of the quality of LLM responses. This study can provide valuable insights for practitioners on uncertainty management when adopting LLMs. The code to replicate all our experiments is available at https://github.com/zlin7/UQ-NLG.

翻译：大型语言模型（LLMs）在自然语言生成（NLG）领域近期展现出跨领域的显著能力。然而，评估LLM生成响应的可信度仍是一项开放挑战，针对NLG不确定性量化的研究尚不充分。此外，现有文献通常假设可白盒访问语言模型，但这一假设因最新LLM的闭源特性或计算约束而日益不切实际。本研究探讨了黑箱LLM在NLG中的不确定性量化问题。我们首先区分两个紧密相关的概念：仅依赖于输入的“不确定性”，以及额外依赖于生成响应的“置信度”。随后，我们提出并比较了多种置信度/不确定性度量指标，并将其应用于“选择性NLG”场景——其中不可靠结果可被忽略或提交进一步评估。基于多个流行LLM和数据集的研究发现，一个简单而有效的平均语义离散度指标可作为LLM响应质量的可靠预测器。本研究可为实践者在采用LLM时的不确定性管理提供重要见解。复现所有实验的代码可从https://github.com/zlin7/UQ-NLG获取。

0

相关内容

语言模型化

语言模型化

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

高原鼢鼠（Myospalax baileyi）扩散机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

DARA效应位错机制的TEM揭示和分子动力学模拟的验证

国家自然科学基金

1+阅读 · 2013年12月31日

介尺度磁性复合囊泡状结构材料的可控构筑及性能

国家自然科学基金

0+阅读 · 2012年12月31日

Nrf2-ARE信号通路在氢气干预新生儿坏死性小肠结肠炎中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

β钛合金中孪生诱导塑性(TWIP)效应

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Wnt/β-catenin通路在内皮抑素抑制动脉粥样硬化斑块内微血管新生中的作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

Re合金化镍基单晶高温合金的强韧化机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Generating Mathematical Derivations with Large Language Models

Arxiv

0+阅读 · 2023年7月19日

Revisiting Softmax for Uncertainty Approximation in Text Classification

Arxiv

0+阅读 · 2023年7月19日

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

Arxiv

0+阅读 · 2023年7月19日

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Arxiv

0+阅读 · 2023年7月19日

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

Arxiv

0+阅读 · 2023年7月18日

A Survey on Evaluation of Large Language Models

Arxiv

0+阅读 · 2023年7月18日

On the application of Large Language Models for language teaching and assessment technology

Arxiv

0+阅读 · 2023年7月17日

In-IDE Generation-based Information Support with a Large Language Model

Arxiv

0+阅读 · 2023年7月17日

Leveraging Large Language Models to Generate Answer Set Programs

Arxiv

0+阅读 · 2023年7月15日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

VIP会员

文章信息

相关主题

语言模型化

预测器/决策函数

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

4+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Generating Mathematical Derivations with Large Language Models

Arxiv

0+阅读 · 2023年7月19日

Revisiting Softmax for Uncertainty Approximation in Text Classification

Arxiv

0+阅读 · 2023年7月19日

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

Arxiv

0+阅读 · 2023年7月19日

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Arxiv

0+阅读 · 2023年7月19日

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

Arxiv

0+阅读 · 2023年7月18日

A Survey on Evaluation of Large Language Models

Arxiv

0+阅读 · 2023年7月18日

On the application of Large Language Models for language teaching and assessment technology

Arxiv

0+阅读 · 2023年7月17日

In-IDE Generation-based Information Support with a Large Language Model

Arxiv

0+阅读 · 2023年7月17日

Leveraging Large Language Models to Generate Answer Set Programs

Arxiv

0+阅读 · 2023年7月15日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

相关基金

高原鼢鼠（Myospalax baileyi）扩散机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

DARA效应位错机制的TEM揭示和分子动力学模拟的验证

国家自然科学基金

1+阅读 · 2013年12月31日

介尺度磁性复合囊泡状结构材料的可控构筑及性能

国家自然科学基金

0+阅读 · 2012年12月31日

Nrf2-ARE信号通路在氢气干预新生儿坏死性小肠结肠炎中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

β钛合金中孪生诱导塑性(TWIP)效应

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Wnt/β-catenin通路在内皮抑素抑制动脉粥样硬化斑块内微血管新生中的作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

Re合金化镍基单晶高温合金的强韧化机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员