We study how an agent in a two-player repeated game can effectively utilize potentially imperfect advice when interacting with a no-regret learner. We characterize the advice landscape by introducing a pseudo-metric to quantify the usefulness of an advice instance. We demonstrate the pseudo-metric's applicability through two forms of advice: simulators and payoff matrix predictions. We then show how an optimizing player, equipped with correctness guarantees on the advice, could leverage simulators to compute approximate Stackelberg strategies more efficiently, reducing the interaction complexity traditionally required and illustrating the power of good advice. Finally, we extend our analysis to settings where the advice does not have any guarantee of correctness. We find that, in general, a player cannot simultaneously guarantee near Stackelberg performance when the advice is approximately accurate and a no-regret condition when the advice is inaccurate. We do show, however, that it is possible for an advice-aided player to weakly dominate their utility in some (coarse)-correlated equilibria.
翻译:我们研究在双人重复博弈中,智能体如何在与无悔学习者交互时有效利用可能存在不完美性的建议。通过引入一种伪度量来量化建议实例的有用性,我们对建议环境进行了刻画。通过两种建议形式:模拟器和收益矩阵预测,我们展示了该伪度量的适用性。随后,我们展示了配备建议正确性保证的优化玩家如何利用模拟器更高效地计算近似Stackelberg策略,从而降低传统所需的交互复杂度,并彰显优质建议的强大能力。最后,我们将分析扩展到建议不具备任何正确性保证的情境。我们发现,一般而言,玩家无法同时保证在建议近似准确时接近Stackelberg性能,以及在建议不准确时满足无悔条件。然而,我们确实证明,在某些(粗)相关均衡中,获得建议辅助的玩家有可能弱主导其效用。