Classic principal-agent problems such as Stackelberg games, contract design, and Bayesian persuasion, often assume that the agent is able to best respond to the principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot generalized principal-agent problem where the agent approximately best responds. Using this reduction, we show that: (1) If the agent uses contextual no-regret learning algorithms with regret $\mathrm{Reg}(T)$, then the principal can guarantee utility at least $U^* - \Theta\big(\sqrt{\tfrac{\mathrm{Reg}(T)}{T}}\big)$, where $U^*$ is the principal's optimal utility in the classic model with a best-responding agent. (2) If the agent uses contextual no-swap-regret learning algorithms with swap-regret $\mathrm{SReg}(T)$, then the principal cannot obtain utility more than $U^* + O(\frac{\mathrm{SReg(T)}}{T})$. But (3) if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can sometimes do significantly better than $U^*$. These results not only refine previous results in Stackelberg games and contract design, but also lead to new results for Bayesian persuasion with a learning agent and all generalized principal-agent problems where the agent does not have private information.
翻译:经典的委托-代理问题(如斯塔克尔伯格博弈、契约设计与贝叶斯劝说)通常假设代理能够对委托方承诺的策略做出最优反应。本研究在委托方不具备承诺能力且代理方采用算法学习如何响应委托方的假设下,探讨重复性广义委托-代理问题。我们将该问题约化为代理方近似最优响应的单阶段广义委托-代理问题。通过此约化方法,我们证明:(1)若代理采用遗憾值为$\mathrm{Reg}(T)$的情境无遗憾学习算法,委托方可确保获得至少$U^* - \Theta\big(\sqrt{\tfrac{\mathrm{Reg}(T)}{T}}\big)$的效用,其中$U^*$为经典模型中代理最优响应时委托方的最优效用。(2)若代理采用交换遗憾值为$\mathrm{SReg}(T)$的情境无交换遗憾学习算法,委托方无法获得超过$U^* + O(\frac{\mathrm{SReg(T)}}{T})$的效用。但(3)若代理采用基于均值的学习算法(可能具有无遗憾性但不具备无交换遗憾性),委托方有时能获得显著优于$U^*$的效用。这些结论不仅细化了斯塔克尔伯格博弈与契约设计领域的既有成果,还为存在学习型代理的贝叶斯劝说问题,以及所有代理方不拥有私有信息的广义委托-代理问题提供了新的理论结果。