In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.
翻译:在常俊范等人(《自然·通讯》,https://doi.org/10.1038/s41467-023-36363-w,2023)的研究中,作者提出了一种深度强化学习方法以增强组合优化启发式算法。具体而言,他们针对多个自旋玻璃基态问题(其中非平面网络上的实例通常属于NP难问题)给出了结果,并与若干基于蒙特卡洛的方法(如模拟退火(SA)和并行回火(PT))进行了比较。这些结果确实表明,强化学习改进了基于SA或PT获得的结果,或至少能在获得与其他方法相当质量结果的前提下缩短启发式算法的运行时间。为得出其方法“更优”的结论,作者采用了两种基本策略:(1)调用商业GUROBI求解器获取精确基态样本作为比较基准;(2)对难以确定精确基态的更大规模实例样本进行启发式算法的直接对比。在此,我们将这些研究置于更宏观的背景下,指出所宣称的优越性在小规模样本中至多具有边际意义,而在大规模样本中,该方法在合理近似真实基态方面基本失去实际意义。例如,作者提及该方法在确定维数高于2(d>2)时的刚度指数θ时变得无关紧要——该问题不仅属于NP难问题,还需对两个几乎相等的基态能量进行相减,而本研究中发现的每个能量约1%的系统性误差是不可接受的。通过对作者使用的自旋玻璃系综进行简单的有限尺寸修正研究(所用数据已存在数十年),我们得出了关于该方法的这一更全面图景。