Diversifying AI: Towards Creative Chess with AlphaZero

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.

翻译：近年来，人工智能系统已在多种计算任务中超越人类智能。然而，与人类类似，人工智能系统也会犯错、存在盲点、产生幻觉，并且难以泛化至新情境。本研究探讨当人工智能被推向计算理性极限时，能否从创造性决策机制中获益。具体而言，我们研究在具有挑战性的任务中，一个由多样化人工智能系统组成的团队是否能够通过集体产生更多想法并筛选最优方案，从而超越单一人工智能的表现。我们以国际象棋——即所谓的人工智能"果蝇"——为研究载体，基于AlphaZero构建框架，并通过潜在条件架构将其扩展为多智能体联盟，称之为AZ_db。我们运用行为多样性技术训练AZ_db以产生更广泛的想法，并利用次可加规划筛选最具潜力的方案。实验表明，AZ_db能够以多样化方式下棋，以团队形式解决更多棋局难题，并胜过同质性更高的团队。值得注意的是，AZ_db解决的挑战性谜题数量是AlphaZero的两倍，其中包含高难度的彭罗斯棋局。在不同开局模式下进行对弈时，我们观察到AZ_db中的智能体在不同开局中呈现专业化特征，通过次可加规划为每种开局选择对应智能体，可使棋力评分提升50埃洛分。我们的研究结果表明，多样性红利不仅存在于人类团队，同样出现在人工智能智能体团队中，这证明多样性是解决计算困难问题的宝贵资产。