Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation - 专知论文

会员服务 ·

0

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

翻译：暂无翻译

Seamus Somerstep,Vinod Raman,Unique Subedi,Yuekai Sun

from arxiv, AISTATS 2026 Camera Ready

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

翻译：暂无翻译

0

相关内容

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

专知会员服务

20+阅读 · 1月17日

AAAI 2024 ｜ GCIL：因果视角下的图对比不变学习

AAAI 2024 ｜ GCIL：因果视角下的图对比不变学习

专知会员服务

20+阅读 · 2024年3月5日

EMNLP2023｜大语言模型知识编辑问题、方法与挑战

EMNLP2023｜大语言模型知识编辑问题、方法与挑战

专知会员服务

46+阅读 · 2024年1月2日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

【AAAI 2019 Tutorial】不确定性下基于知识的顺序决策（Knowledge-based Sequential Decision-Making under Uncertainty），张世琦，Mohan Sridharan

【AAAI 2019 Tutorial】不确定性下基于知识的顺序决策（Knowledge-based Sequential Decision-Making under Uncertainty），张世琦，Mohan Sridharan

专知会员服务

13+阅读 · 2019年11月18日

【CIKM2019 Tutorial】Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join(字符串相似性搜索与连接：数据库技术与机器学习模型的协同)，附论文免费下载

【CIKM2019 Tutorial】Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join(字符串相似性搜索与连接：数据库技术与机器学习模型的协同)，附论文免费下载

专知会员服务

10+阅读 · 2019年11月3日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

论文荐读：理解图表示学习中的负采样

论文荐读：理解图表示学习中的负采样

学术头条

29+阅读 · 2020年5月29日

从One-hot, Word embedding到Transformer，一步步教你理解Bert

从One-hot, Word embedding到Transformer，一步步教你理解Bert

AI100

15+阅读 · 2019年6月25日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

开放知识图谱

14+阅读 · 2018年7月5日

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

AI研习社

21+阅读 · 2018年6月14日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

用于数学的 10 个优秀编程语言

用于数学的 10 个优秀编程语言

算法与数据结构

13+阅读 · 2018年1月5日

干货|掌握机器学习数学基础之优化[1]（重点知识）

干货|掌握机器学习数学基础之优化[1]（重点知识）

机器学习研究会

10+阅读 · 2017年11月19日

两类哈密顿偏微分方程拟周期解问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

一类大规模实对称锥规划算法

国家自然科学基金

0+阅读 · 2015年12月31日

Filling问题的最优化原理及其求解方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

关联规则集上的知识发现

国家自然科学基金

9+阅读 · 2015年12月31日

几类含∞-Laplace算子的特征值问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

几类典型稀疏优化问题的算法、理论及应用

国家自然科学基金

3+阅读 · 2014年12月31日

超分辨率中的矩阵值算子学习问题

国家自然科学基金

1+阅读 · 2014年12月31日

几个堆垒素数问题定量研究

国家自然科学基金

0+阅读 · 2014年12月31日

矩阵分解问题的优化算法与理论

国家自然科学基金

8+阅读 · 2014年12月31日

多元多项式环的Hermite性质与多项式矩阵的分解

国家自然科学基金

0+阅读 · 2014年12月31日

Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items

Arxiv

0+阅读 · 5月4日

ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning

Arxiv

0+阅读 · 4月30日

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Arxiv

0+阅读 · 4月29日

Coarse Q-learning: Indifference vs. Indeterminacy vs. Instability

Arxiv

0+阅读 · 4月29日

A cautious approach to constraint-based causal model selection

Arxiv

0+阅读 · 4月28日

On the Capacity of Distinguishable Synthetic Identity Generation under Face Verification

Arxiv

0+阅读 · 4月12日

String Representation in Suffixient Set Size Space

Arxiv

0+阅读 · 4月6日

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

Arxiv

0+阅读 · 4月2日

ProTPS: Prototype-Guided Text Prompt Selection for Continual Learning

Arxiv

0+阅读 · 4月1日

The Power of Power Codes: New Classes of Easy Instances for the Linear Equivalence Problem

Arxiv

0+阅读 · 3月24日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

9+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

5+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

6+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

7+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

9+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

14+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

6+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

9+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

3+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

3+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

13+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

9+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

10+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

7+阅读 · 5月4日

相关VIP内容

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

AAAI 2026 | 构建模板-定理知识图谱以增强大语言模型的数学推理能力

专知会员服务

20+阅读 · 1月17日

AAAI 2024 ｜ GCIL：因果视角下的图对比不变学习

AAAI 2024 ｜ GCIL：因果视角下的图对比不变学习

专知会员服务

20+阅读 · 2024年3月5日

EMNLP2023｜大语言模型知识编辑问题、方法与挑战

EMNLP2023｜大语言模型知识编辑问题、方法与挑战

专知会员服务

46+阅读 · 2024年1月2日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

【AAAI 2019 Tutorial】不确定性下基于知识的顺序决策（Knowledge-based Sequential Decision-Making under Uncertainty），张世琦，Mohan Sridharan

【AAAI 2019 Tutorial】不确定性下基于知识的顺序决策（Knowledge-based Sequential Decision-Making under Uncertainty），张世琦，Mohan Sridharan

专知会员服务

13+阅读 · 2019年11月18日

【CIKM2019 Tutorial】Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join(字符串相似性搜索与连接：数据库技术与机器学习模型的协同)，附论文免费下载

【CIKM2019 Tutorial】Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join(字符串相似性搜索与连接：数据库技术与机器学习模型的协同)，附论文免费下载

专知会员服务

10+阅读 · 2019年11月3日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

论文荐读：理解图表示学习中的负采样

论文荐读：理解图表示学习中的负采样

学术头条

29+阅读 · 2020年5月29日

从One-hot, Word embedding到Transformer，一步步教你理解Bert

从One-hot, Word embedding到Transformer，一步步教你理解Bert

AI100

15+阅读 · 2019年6月25日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

论文浅尝 | 用可微的逻辑规则学习完成知识库推理

开放知识图谱

14+阅读 · 2018年7月5日

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

AI研习社

21+阅读 · 2018年6月14日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

用于数学的 10 个优秀编程语言

用于数学的 10 个优秀编程语言

算法与数据结构

13+阅读 · 2018年1月5日

干货|掌握机器学习数学基础之优化[1]（重点知识）

干货|掌握机器学习数学基础之优化[1]（重点知识）

机器学习研究会

10+阅读 · 2017年11月19日

相关论文

Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items

Arxiv

0+阅读 · 5月4日

ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning

Arxiv

0+阅读 · 4月30日

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Arxiv

0+阅读 · 4月29日

Coarse Q-learning: Indifference vs. Indeterminacy vs. Instability

Arxiv

0+阅读 · 4月29日

A cautious approach to constraint-based causal model selection

Arxiv

0+阅读 · 4月28日

On the Capacity of Distinguishable Synthetic Identity Generation under Face Verification

Arxiv

0+阅读 · 4月12日

String Representation in Suffixient Set Size Space

Arxiv

0+阅读 · 4月6日

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

Arxiv

0+阅读 · 4月2日

ProTPS: Prototype-Guided Text Prompt Selection for Continual Learning

Arxiv

0+阅读 · 4月1日

The Power of Power Codes: New Classes of Easy Instances for the Linear Equivalence Problem

Arxiv

0+阅读 · 3月24日

相关基金

两类哈密顿偏微分方程拟周期解问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

一类大规模实对称锥规划算法

国家自然科学基金

0+阅读 · 2015年12月31日

Filling问题的最优化原理及其求解方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

关联规则集上的知识发现

国家自然科学基金

9+阅读 · 2015年12月31日

几类含∞-Laplace算子的特征值问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

几类典型稀疏优化问题的算法、理论及应用

国家自然科学基金

3+阅读 · 2014年12月31日

超分辨率中的矩阵值算子学习问题

国家自然科学基金

1+阅读 · 2014年12月31日

几个堆垒素数问题定量研究

国家自然科学基金

0+阅读 · 2014年12月31日

矩阵分解问题的优化算法与理论

国家自然科学基金

8+阅读 · 2014年12月31日

多元多项式环的Hermite性质与多项式矩阵的分解

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员