Incentive design constitutes a foundational paradigm for influencing the behavior of strategic agents, wherein a system planner (principal) publicly commits to an incentive mechanism designed to align individual objectives with collective social welfare. This paper introduces the Regret-Minimizing Adaptive Incentive Design (RAID) problem, which aims to synthesize incentive laws under information asymmetry and achieve asymptotically minimal regret compared to an oracle with full information. To this end, we develop the RAID algorithm, which employs a switching policy alternating between probing (exploration) and estimate-based incentivization (exploitation). The associated type estimator relies only on a weaker excitation condition required for strong consistency in least squares estimation, substantially relaxing the persistence-of-excitation assumptions previously used in adaptive incentive design. In addition, we establish the strong consistency of the proposed type estimator and prove that the incentive obtained asymptotically minimizes the planner's average regret almost surely. Numerical experiments illustrate the convergence rate of the proposed methodology.
翻译:激励设计构成了影响策略性智能体行为的基础范式,其中系统规划者(委托人)公开承诺一种激励机制,旨在协调个体目标与集体社会福利。本文提出了遗憾最小化自适应激励设计(RAID)问题,旨在信息不对称条件下综合激励法则,并与具备完全信息的理想基准相比实现渐近最小遗憾。为此,我们开发了RAID算法,该算法采用交替策略,在探测(探索)与基于估计的激励(利用)之间切换。相关的类型估计器仅依赖于弱激励条件,该条件足以保证最小二乘估计的强一致性,从而大幅放宽了先前自适应激励设计中使用的持续激励假设。此外,我们建立了所提类型估计器的强一致性,并证明所获激励几乎必然地渐近最小化规划者的平均遗憾。数值实验展示了所提方法的收敛速度。