When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss.
翻译:在与非竞争性决策智能体进行交互时,自主智能体必须具备可推断行为:其动作必须能够传达自身的意图和策略。例如,自动驾驶汽车的策略必须能被与之交互的行人推断。我们利用重复双矩阵斯塔克尔伯格博弈(带观测机制)对可推断性问题进行建模,其中领导者与跟随者进行重复交互。在交互过程中,领导者采用固定(可能混合)策略。而跟随者则未知领导者的策略,仅基于领导者先前动作的观测动态做出反应。在带观测的设定下,领导者可能遭受可推断性损失,即相比于跟随者完全掌握领导者策略信息的设定,其性能会下降。我们证明该可推断性损失的上界由交互次数与领导者策略随机性水平的函数构成,这鼓励使用较低随机性水平的可推断策略。反之,我们也构造了一个博弈实例,其中所需的交互次数下界由目标可推断性损失的函数决定。