We initiate the study of structured Stackelberg games, a novel form of strategic interaction between a leader and a follower where contextual information can be predictive of the follower's (unknown) type. Motivated by applications such as security games and AI safety, we show how this additional structure can help the leader learn a utility-maximizing policy in both the online and distributional settings. In the online setting, we first prove that standard learning-theoretic measures of complexity do not characterize the difficulty of the leader's learning task. Notably, we find that there exists a learning-theoretic measure of complexity, analogous to the Littlestone dimension in online classification, that tightly characterizes the leader's instance-optimal regret. We term this the Stackelberg-Littlestone dimension, and leverage it to provide a provably optimal online learning algorithm. In the distributional setting, we provide analogous results by showing that two new dimensions control the sample complexity upper- and lower-bound.
翻译:我们首次研究了结构化斯塔克尔伯格博弈,这是一种领导者与跟随者之间新颖的战略互动形式,其中情境信息能够预测跟随者的(未知)类型。受安全博弈与人工智能安全等应用的驱动,我们展示了这种附加结构如何帮助领导者在在线场景与分布场景中学习效用最大化的策略。在在线场景中,我们首先证明标准学习理论复杂度度量无法刻画领导者学习任务的难度。值得注意的是,我们发现存在一种类似于在线分类中利特尔斯通维度的学习理论复杂度度量,能够紧密刻画领导者的实例最优遗憾。我们将其命名为斯塔克尔伯格-利特尔斯通维度,并利用该维度提出一种可证明最优的在线学习算法。在分布场景中,我们通过证明两个新维度控制样本复杂度的上界与下界,给出了类似的结果。