Efficient Model-agnostic Alignment via Bayesian Persuasion

With recent advancements in large language models (LLMs), alignment has emerged as an effective technique for keeping LLMs consensus with human intent. Current methods primarily involve direct training through Supervised Fine-tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), both of which require substantial computational resources and extensive ground truth data. This paper explores an efficient method for aligning black-box large models using smaller models, introducing a model-agnostic and lightweight Bayesian Persuasion Alignment framework. We formalize this problem as an optimization of the signaling strategy from the small model's perspective. In the persuasion process, the small model (Advisor) observes the information item (i.e., state) and persuades large models (Receiver) to elicit improved responses. The Receiver then generates a response based on the input, the signal from the Advisor, and its updated belief about the information item. Through training using our framework, we demonstrate that the Advisor can significantly enhance the performance of various Receivers across a range of tasks. We theoretically analyze our persuasion framework and provide an upper bound on the Advisor's regret, confirming its effectiveness in learning the optimal signaling strategy. Our Empirical results demonstrates that GPT-2 can significantly improve the performance of various models, achieving an average enhancement of 16.1% in mathematical reasoning ability and 13.7% in code generation. We hope our work can provide an initial step toward rethinking the alignment framework from the Bayesian Persuasion perspective.

翻译：随着大型语言模型（LLM）的最新进展，对齐已成为使LLM与人类意图保持一致的有效技术。现有方法主要涉及通过监督微调（SFT）或基于人类反馈的强化学习（RLHF）进行直接训练，这两种方法都需要大量计算资源和广泛的真实标注数据。本文探索了一种利用较小模型对齐黑盒大模型的高效方法，提出了一种模型无关且轻量级的贝叶斯说服对齐框架。我们将该问题形式化为从小模型视角出发的信号策略优化问题。在说服过程中，小模型（顾问）观察信息项（即状态）并说服大模型（接收者）以产生更优响应。接收者随后基于输入、来自顾问的信号以及其更新后的信息项信念生成响应。通过使用我们的框架进行训练，我们证明顾问能够显著提升不同接收者在多种任务上的性能。我们从理论上分析了该说服框架，并给出了顾问遗憾值的上界，证实了其学习最优信号策略的有效性。实验结果表明，GPT-2能够显著提升多种模型的性能，在数学推理能力上平均提升16.1%，在代码生成上平均提升13.7%。我们希望本研究能为从贝叶斯说服视角重新思考对齐框架提供初步探索。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日