Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.
翻译:在安全博弈或间接机制设计等多种常用学习问题中,斯塔克尔伯格均衡自然涌现,并已在强化学习领域受到日益关注。我们提出将斯塔克尔伯格均衡搜索实现为多智能体RL问题的通用框架,允许广泛的算法设计选择。我们讨论了先前方法如何被视为该框架的具体实例化。关键洞见在于,注意到设计空间包含了文献中尚未出现的方法,例如利用多任务和元RL技术实现追随者收敛。我们提出一种采用上下文策略的具体方法,并在标准及新型基准领域进行实验评估,结果显示其样本效率较先前方法显著提升。最后,我们探讨了在框架边界之外采用算法设计的影响。