In 2021, Adam Zsolt Wagner proposed an approach to disprove conjectures in graph theory using Reinforcement Learning (RL). Wagner's idea can be framed as follows: consider a conjecture, such as a certain quantity f(G) < 0 for every graph G; one can then play a single-player graph-building game, where at each turn the player decides whether to add an edge or not. The game ends when all edges have been considered, resulting in a certain graph G_T, and f(G_T) is the final score of the game; RL is then used to maximize this score. This brilliant idea is as simple as innovative, and it lends itself to systematic generalization. Several different single-player graph-building games can be employed, along with various RL algorithms. Moreover, RL maximizes the cumulative reward, allowing for step-by-step rewards instead of a single final score, provided the final cumulative reward represents the quantity of interest f(G_T). In this paper, we discuss these and various other choices that can be significant in Wagner's framework. As a contribution to this systematization, we present four distinct single-player graph-building games. Each game employs both a step-by-step reward system and a single final score. We also propose a principled approach to select the most suitable neural network architecture for any given conjecture, and introduce a new dataset of graphs labeled with their Laplacian spectra. Furthermore, we provide a counterexample for a conjecture regarding the sum of the matching number and the spectral radius, which is simpler than the example provided in Wagner's original paper. The games have been implemented as environments in the Gymnasium framework, and along with the dataset, are available as open-source supplementary materials.
翻译:2021年,亚当·佐尔特·瓦格纳提出了一种利用强化学习(RL)证伪图论猜想的方法。瓦格纳的核心思想可表述为:给定一个猜想(例如某函数f(G) < 0对所有图G成立),可构建一个单人建图博弈——玩家每轮决定是否添加一条边。当所有边都被考虑后博弈结束,生成最终图G_T,此时f(G_T)即为博弈终局得分;随后运用RL最大化该得分。这一开创性方法兼具简洁性与创新性,且具备系统化推广的潜力。实践中可采用多种不同的单人建图博弈形式与RL算法组合。此外,若最终累积奖励能反映目标函数f(G_T),RL可通过逐步奖励机制(而非单一终局得分)实现累积奖励最大化。本文系统探讨了瓦格纳框架中可能影响结果的多类设计选择。作为系统化工作的贡献,我们提出了四种不同的单人建图博弈,每种博弈均兼容逐步奖励机制与单一终局得分两种模式。我们还提出了一种基于原则的神经网络架构选择方法,可根据具体猜想特性适配最优架构,并发布了带有拉普拉斯谱标注的新图数据集。进一步地,我们针对匹配数与谱半径之和的猜想给出了反例,该反例较瓦格纳原论文中的示例更为简洁。所有博弈均已实现为Gymnasium框架的环境,相关代码与数据集均已作为开源补充材料发布。