GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .

翻译：大型语言模型（LLM）在推理方面取得了显著进展，但在写作、信息检索或提供实践指导等任务中，有时会生成对用户而言并非最优的响应。传统的对齐实践通常假设最大化模型奖励即最大化用户福祉，但这一假设在实践中常常不成立：当用户偏好简洁答案时，模型可能过度澄清或生成过于冗长的推理。此类行为类似于囚徒困境，即个体理性的选择导致社会次优的结果。根本挑战在于缺乏一种使LLM与用户共同受益的原则性决策机制。我们提出了博弈论对齐（GTAlign），这是一个将博弈论决策整合到推理与训练中的对齐框架。在推理过程中，模型明确将用户-LLM交互视为策略博弈：它在推理链中构建收益矩阵以估计自身与用户的福祉，随后选择对双方均有益的行动。在训练阶段，我们引入了一种共同福祉奖励，以强化合作性响应，使模型行为与社会高效结果保持一致。此外，我们提出了一种推理技术，该技术利用博弈论推理在LLM服务定价策略变化时动态调整LLM的响应。大量实验表明，与各类任务中的基线方法相比，GTAlign在推理效率、答案质量及共同福祉方面均有显著提升。代码发布于 https://github.com/ulab-uiuc/GTAlign。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日