We study a ubiquitous learning challenge in online principal-agent problems during which the principal learns the agent's private information from the agent's revealed preferences in historical interactions. This paradigm includes important special cases such as pricing and contract design, which have been widely studied in recent literature. However, existing work considers the case where the principal can only choose a single strategy at every round to interact with the agent and then observe the agent's revealed preference through their actions. In this paper, we extend this line of study to allow the principal to offer a menu of strategies to the agent and learn additionally from observing the agent's selection from the menu. We provide a thorough investigation of several online principal-agent problem settings and characterize their sample complexities, accompanied by the corresponding algorithms we have developed. We instantiate this paradigm to several important design problems $-$ including Stackelberg (security) games, contract design, and information design. Finally, we also explore the connection between our findings and existing results about online learning in Stackelberg games, and we offer a solution that can overcome a key hard instance of Peng et al. (2019).
翻译:我们研究了在线主从问题中一个普遍存在的学习挑战:在这一过程中,委托人通过历史交互中代理人揭示的偏好来学习代理人的私有信息。这一范式涵盖了如定价与合同设计等重要特例,这些已在近期的文献中被广泛研究。然而,现有工作通常假定委托人每轮只能选择单一策略与代理人交互,并通过其行为观察代理人揭示的偏好。本文进一步拓展了这一研究方向,允许委托人向代理人提供一组策略菜单,并通过观察代理人对菜单的选择来获取更多学习信息。我们对多种在线主从问题设定进行了深入探究,刻画了其样本复杂度,并提出了相应的算法。我们将这一范式应用于多个重要设计问题——包括斯塔克尔伯格(安全)博弈、合同设计以及信息设计。最后,我们还探讨了研究结果与现有关于在线学习在斯塔克尔伯格博弈中结论的联系,并提出了一种能够克服Peng等人(2019)关键困难实例的解决方案。