Regret minimization is a key component of many algorithms for finding Nash equilibria in imperfect-information games. To scale to games that cannot fit in memory, we can use search with value functions. However, calling the value functions repeatedly in search can be expensive. Therefore, it is desirable to minimize regret in the search tree as fast as possible. We propose to accelerate the regret minimization by introducing a general ``learning not to regret'' framework, where we meta-learn the regret minimizer. The resulting algorithm is guaranteed to minimize regret in arbitrary settings and is (meta)-learned to converge fast on a selected distribution of games. Our experiments show that meta-learned algorithms converge substantially faster than prior regret minimization algorithms.
翻译:遗憾最小化是不完美信息博弈中寻找纳什均衡的许多算法的关键组成部分。为了扩展到无法放入内存的博弈,我们可以使用带有值函数的搜索。然而,在搜索中反复调用值函数可能代价高昂。因此,理想的做法是尽可能快地最小化搜索树中的遗憾。我们提出通过引入一个通用的“学习无悔”框架来加速遗憾最小化,在该框架中我们对遗憾最小化器进行元学习。由此产生的算法保证在任意设置中最小化遗憾,并且(通过元学习)在选定的博弈分布上快速收敛。我们的实验表明,元学习算法比先前的遗憾最小化算法收敛速度显著加快。