Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.
翻译:Stackelberg均衡广泛出现在一系列流行的学习问题中,例如安全博弈或间接机制设计,并在强化学习文献中受到越来越多的关注。我们提出一个通用框架,将Stackelberg均衡搜索实现为多智能体强化学习问题,支持多种算法设计选择。我们讨论了以往方法如何被视为该框架的具体实例。一个关键见解是,设计空间允许采用文献中尚未出现的方法,例如利用多任务和元强化学习技术实现追随者收敛。我们提出一种基于情境策略的此类方法,并在标准和新基准领域进行实验评估,结果表明其样本效率较以往方法大幅提升。最后,我们探讨了采用框架边界之外的算法设计所产生的效果。