We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI safety via debate, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for internal regret minimization whose regret and per-iteration computation depend logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.
翻译:我们考虑具有海量动作的重复博弈中的遗憾最小化问题。这类博弈天然存在于基于辩论的AI安全场景中,更广泛而言,所有动作基于语言表述的博弈均属此类。现有在线博弈算法要求每轮迭代计算量与动作数量呈多项式关系,这在大型博弈中可能难以实现。因此我们提出基于预言机的算法,因为预言机天然地模拟了对AI智能体的调用接口。借助预言机访问机制,我们刻画了内部遗憾与外部遗憾可被高效最小化的条件。我们提出一种新颖的内部遗憾最小化高效算法,其遗憾值与每轮迭代计算量均与动作数量呈对数关系。最后我们通过在"基于辩论的AI安全"场景中的实验,展示了算法分析观点带来的实践效益。