Generalized principal-agent problems, including Stackelberg games, contract design, and Bayesian persuasion, are a class of economic problems where an agent best responds to a principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. Using this reduction, we show that: (1) if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is at least the principal's optimal utility in the classic non-learning model minus the square root of the agent's regret; (2) if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility more than the optimal utility in the non-learning model plus the agent's swap regret. But (3) if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These general results not only refine previous results in Stackelberg games and contract design with learning agents but also lead to new results for Bayesian persuasion with a learning agent.
翻译:广义委托-代理问题(包括斯塔克尔伯格博弈、契约设计和贝叶斯劝说)是一类经济学问题,其中代理方会对委托方承诺的策略做出最优反应。我们研究重复的广义委托-代理问题,假设委托方不具备承诺能力,而代理方使用算法学习如何对委托方的行为做出反应。我们将该问题简化为具有近似最优反应代理方的一次性广义委托-代理问题。通过这种简化,我们证明:(1)若代理方采用上下文无遗憾学习算法,则委托方可获得的效用至少为经典非学习模型中委托方最优效用减去代理方遗憾值的平方根;(2)若代理方采用上下文无交换遗憾学习算法,则委托方无法获得超过非学习模型中最优效用与代理方交换遗憾值之和的效用。但(3)若代理方采用基于均值的学习算法(可能实现无遗憾但无法实现无交换遗憾),则委托方的表现可显著优于非学习模型。这些一般性结论不仅细化了先前关于斯塔克尔伯格博弈和学习型代理契约设计的研究结果,还为存在学习型代理的贝叶斯劝说问题带来了新的理论进展。