We study a ubiquitous learning challenge in online principal-agent problems during which the principal learns the agent's private information from the agent's revealed preferences in historical interactions. This paradigm includes important special cases such as pricing and contract design, which have been widely studied in recent literature. However, existing work considers the case where the principal can only choose a single strategy at every round to interact with the agent and then observe the agent's revealed preference through their actions. In this paper, we extend this line of study to allow the principal to offer a menu of strategies to the agent and learn additionally from observing the agent's selection from the menu. We provide a thorough investigation of several online principal-agent problem settings and characterize their sample complexities, accompanied by the corresponding algorithms we have developed. We instantiate this paradigm to several important design problems $-$ including Stackelberg (security) games, contract design, and information design. Finally, we also explore the connection between our findings and existing results about online learning in Stackelberg games, and we offer a solution that can overcome a key hard instance of Peng et al. (2019).
翻译:我们研究了在线主从问题中一个普遍存在的学习挑战,其中委托人通过历史交互中代理人揭示的偏好来学习代理人的私有信息。这一范式包含了许多重要的特例,例如定价和契约设计,这些在近期的文献中已被广泛研究。然而,现有工作考虑的是委托人在每一轮只能选择单一策略与代理人互动,然后通过代理人的行动观察其揭示的偏好。在本文中,我们扩展了这一研究方向,允许委托人向代理人提供策略菜单,并额外从观察代理人从菜单中的选择中学习。我们深入研究了多个在线主从问题设置,刻画了其样本复杂度,并给出了相应开发的算法。我们将这一范式应用于几个重要设计问题——包括斯塔克尔伯格(安全)博弈、契约设计和信息设计。最后,我们还探讨了我们的发现与斯塔克尔伯格博弈中在线学习现有结果之间的联系,并提供了一个能够克服Peng等人(2019)中一个关键困难实例的解决方案。