Counterfactual regret minimization is a family of algorithms of no-regret learning dynamics capable of solving large-scale imperfect information games. We propose implementing this algorithm as a series of dense and sparse matrix and vector operations, thereby making it highly parallelizable for a graphical processing unit, at a cost of higher memory usage. Our experiments show that our implementation performs up to about 401.2 times faster than OpenSpiel's Python implementation and, on an expanded set of games, up to about 203.6 times faster than OpenSpiel's C++ implementation and the speedup becomes more pronounced as the size of the game being solved grows.
翻译:反事实遗憾最小化是一类无遗憾学习动态算法,能够解决大规模非完美信息博弈。我们提出将该算法实现为一系列稠密与稀疏的矩阵及向量运算,从而使其高度并行化以适应图形处理器,代价是更高的内存使用量。我们的实验表明,该实现相比OpenSpiel的Python实现最高可提速约401.2倍;在一组扩展的博弈测试集上,相比OpenSpiel的C++实现最高可提速约203.6倍,且随着求解博弈规模的增大,加速效果更为显著。