Institutional incentives are widely used to promote cooperation among autonomous, self-regarding agents, from human societies to multi-agent and AI systems. Existing work typically treats incentive design as a bi-objective problem: minimise institutional cost while achieving a high long-run frequency of cooperation. Whether such schemes also maximise social welfare - total population payoff net of institutional expenditure - has remained largely unexplored. We develop a welfare-centric framework for institutional incentives in finite, well-mixed populations playing a social dilemma (Donation Game and Public Goods Game), considering both rewards for cooperators and punishments for defectors. For each mechanism, we derive explicit expressions for expected social welfare and characterise how it depends on incentive efficiency and selection intensity. Analytically, we identify parameter regimes where social welfare has a single optimal incentive level and regimes with qualitative phase transitions, in which welfare becomes non-monotonic with multiple local optima. We prove that any welfare-maximising incentive is either zero or concentrated around a simple closed-form target, and we provide an efficient algorithm to compute these optima. Comparing reward and punishment, we further derive close-formed conditions under which reward outperform punishment in terms of social welfare for any given budget. Overall, our results reveal a systematic gap between incentives optimised for cost or cooperation frequency and those that maximise welfare.
翻译:制度激励被广泛用于促进从人类社会到多智能体及人工智能系统中自主自利主体的合作。现有研究通常将激励设计视为双目标问题:在实现高长期合作频率的同时最小化制度成本。此类方案能否同时最大化社会福利——即群体总收益扣除制度支出后的净值——仍鲜有探讨。我们针对有限充分混合群体中参与社会困境(捐赠博弈与公共物品博弈)的场景,构建了以福利为核心的制度激励分析框架,同时考察了对合作者的奖励与对背叛者的惩罚机制。对于每种机制,我们推导了期望社会福利的显式表达式,并刻画了其依赖于激励效率与选择强度的特征。通过理论分析,我们识别出社会福利存在单一最优激励水平的参数区域,以及出现定性相变、福利呈非单调性且具有多个局部最优值的参数区域。我们证明任何最大化福利的激励要么为零,要么集中在简单闭合形式目标值附近,并给出了计算这些最优值的高效算法。通过比较奖励与惩罚机制,我们进一步推导出在任意给定预算下奖励机制优于惩罚机制的闭合形式条件。总体而言,我们的研究揭示了面向成本或合作频率优化的激励与追求福利最大化的激励之间存在系统性差异。