The efficiency of Gröbner basis computation, the standard engine for solving systems of polynomial equations, depends on the choice of monomial ordering. Despite a near-continuum of possible monomial orders, most implementations rely on static heuristics such as GrevLex, guided primarily by expert intuition. We address this gap by casting the selection of monomial orderings as a reinforcement learning problem over the space of admissible orderings. Our approach leverages domain-informed reward signals that accurately reflect the computational cost of Gröbner basis computations and admits efficient Monte Carlo estimation. Experiments on benchmark problems from systems biology and computer vision show that the resulting learned policies consistently outperform standard heuristics, yielding substantial reductions in computational cost. Moreover, we find that these policies resist distillation into simple interpretable models, providing empirical evidence that deep reinforcement learning allows the agents to exploit non-linear geometric structure beyond the scope of traditional heuristics.
翻译:Gröbner基计算作为求解多项式方程组的标准引擎,其效率取决于单项式序的选择。尽管存在近乎连续的单项式序可能性,大多数实现仍依赖静态启发式方法(如GrevLex),这主要基于专家经验。我们通过将单项式序选择构建为可行序空间上的强化学习问题来解决这一缺陷。该方法利用能准确反映Gröbner基计算成本的领域知识型奖励信号,并支持高效的蒙特卡洛估计。在系统生物学和计算机视觉基准问题上的实验表明,所得学习策略持续优于标准启发式方法,显著降低了计算成本。此外,我们发现这些策略难以提炼为简单的可解释模型,这为深度强化学习能使智能体利用传统启发式方法无法捕捉的非线性几何结构提供了实证依据。