The problem of reward design examines the interaction between a leader and a follower, where the leader aims to shape the follower's behavior to maximize the leader's payoff by modifying the follower's reward function. Current approaches to reward design rely on an accurate model of how the follower responds to reward modifications, which can be sensitive to modeling inaccuracies. To address this issue of sensitivity, we present a solution that offers robustness against uncertainties in modeling the follower, including 1) how the follower breaks ties in the presence of nonunique best responses, 2) inexact knowledge of how the follower perceives reward modifications, and 3) bounded rationality of the follower. Our robust solution is guaranteed to exist under mild conditions and can be obtained numerically by solving a mixed-integer linear program. Numerical experiments on multiple test cases demonstrate that our solution improves robustness compared to the standard approach without incurring significant additional computing costs.
翻译:奖励设计问题研究领导者与跟随者之间的交互,其中领导者旨在通过修改跟随者的奖励函数来塑造跟随者的行为,以最大化领导者的收益。当前的奖励设计方法依赖于跟随者对奖励修改如何响应的精确模型,这可能对建模不准确性较为敏感。为解决这种敏感性,我们提出了一种解决方案,该方案能够对跟随者建模中的不确定性提供鲁棒性,包括:1)跟随者在非唯一最优响应情况下如何打破平局,2)对跟随者如何感知奖励修改的不精确了解,以及3)跟随者的有限理性。我们的鲁棒解决方案在温和条件下保证存在,并且可以通过求解混合整数线性规划来数值获取。在多个测试案例上的数值实验表明,与标准方法相比,我们的解决方案在不产生显著额外计算成本的情况下提高了鲁棒性。