We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.
翻译:本文研究在行动空间极其庞大的重复博弈中的遗憾最小化问题。这类博弈在基于AI辩论的安全框架中具有内在必然性,更广泛地存在于行动基于语言描述的博弈场景。现有的在线博弈算法需要在每次迭代中进行与行动数量呈多项式关系的计算,这对于大规模博弈而言可能带来难以承受的计算负担。因此我们研究基于预言机的算法框架,因为预言机天然适用于对AI智能体访问机制的建模。在预言机访问条件下,我们系统刻画了内部遗憾与外部遗憾能够被高效最小化的条件。我们提出了一种创新的高效算法,能够同时实现外部与内部遗憾最小化,且其遗憾界与行动数量呈对数依赖关系。最后,我们在AI安全辩论的框架下进行了实验验证,结果表明我们算法分析中的理论洞见具有实际应用价值。