Incentive design studies how a central authority can influence strategic agents through payments, subsidies, or taxes, so that individual objectives align with collective welfare. This paper introduces a No-Regret Adaptive Incentive Design (RAID) framework for nonlinear games with continuous action spaces and private agent costs. In this framework, the authority (planner) designs incentives that regulate the Nash equilibrium toward a socially optimal action profile, while simultaneously learning agents' unknown preferences from repeated strategic responses. We formulate the RAID problem and construct a least-squares estimator whose strong consistency requires only diminishing excitation. Leveraging this weak excitation requirement, we propose a switching incentive policy that alternates between probing (exploration) and estimate-based (exploitation) incentives. The resulting policy achieves an $O(t^{-0.5})$ parameter estimation rate and accumulates $O(t^{0.5}\log t)$ squared social-cost regret, almost surely. We further extend the framework to an endogenous-noise response model, where standard least-squares estimation is biased due to an error-in-variables correlation between the noise and agent responses. We utilize a repeated-sampling estimator and corresponding switching policy that retain the same almost-sure convergence and regret rates. Numerical experiments validate the effectiveness and predicted convergence rates of the method.
翻译:激励设计研究中央权威如何通过支付、补贴或税收影响策略性主体,使个体目标与集体福利保持一致。本文针对具有连续动作空间和私有主体成本的非线性博弈,提出一种无遗憾自适应激励设计(RAID)框架。在该框架中,权威(规划者)设计激励措施,将纳什均衡调控至社会最优动作轮廓,同时通过重复的策略响应学习主体未知的偏好。我们形式化RAID问题并构建最小二乘估计器,其强一致性仅需衰减激励。利用这一弱激励需求,我们提出一种切换激励策略,交替使用探测(探索)激励和基于估计(利用)激励。该策略实现了$O(t^{-0.5})$的参数估计速率,并以几乎必然的方式累积$O(t^{0.5}\log t)$的平方社会福利遗憾。我们进一步将框架扩展至内生噪声响应模型,其中标准最小二乘估计因噪声与主体响应之间的变量误差相关性而产生偏差。我们采用重复采样估计器及相应的切换策略,保持相同的几乎必然收敛性与遗憾速率。数值实验验证了该方法的有效性及预测收敛速率。