We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme -- called contract -- in order to induce an agent to take a costly, unobservable action leading to favorable outcomes. We consider a generalization of the classical (single-round) version of the problem in which the principal interacts with the agent by committing to contracts over multiple rounds. The principal has no information about the agent, and they have to learn an optimal contract by only observing the outcome realized at each round. We focus on settings in which the size of the agent's action space is small. We design an algorithm that learns an approximately-optimal contract with high probability in a number of rounds polynomial in the size of the outcome space, when the number of actions is constant. Our algorithm solves an open problem by Zhu et al.[2022]. Moreover, it can also be employed to provide a $\tilde{\mathcal{O}}(T^{4/5})$ regret bound in the related online learning setting in which the principal aims at maximizing their cumulative utility, thus considerably improving previously-known regret bounds.
翻译:我们研究委托-代理问题,其中委托人承诺采用一种结果依赖的支付方案(称为合约),以激励代理人采取成本高昂且不可观测的行动,从而产生有利结果。我们考虑了该问题经典(单轮)形式的推广,其中委托人通过多轮承诺合约与代理人进行交互。委托人没有关于代理人的任何信息,他们只能通过观察每轮实现的结果来学习最优合约。我们重点关注代理人动作空间规模较小的场景。当动作数量为常数时,我们设计了一种算法,能够以高概率在结果空间规模的多项式轮数内学习到近似最优合约。我们的算法解决了Zhu等人[2022]提出的一个开放性问题。此外,该算法还可应用于相关在线学习场景,为委托人提供$\tilde{\mathcal{O}}(T^{4/5})$的遗憾界,从而显著改进了先前已知的遗憾界。