The Lenstra-Lenstra-Lovász (LLL) algorithm is a seminal contribution to computer science used for lattice basis reduction, yet its polynomial-time outputs produce bases that are far from optimal as the dimension grows. We show that deep reinforcement learning can discover strictly superior, generalizable reduction strategies by interacting with the primitive action space of LLL. We formulate lattice reduction as a single-player Markov Decision Process (MDP) and train a deep residual network using an AlphaZero-style self-play pipeline augmented with adaptive-horizon MCTS (Monte Carlo Tree Search), which couples multi-step network predictions with an entropy-gated expansion mechanism. The resulting policy, DeltaStar, is trained exclusively on small $8$-dimensional $q$-ary lattices and requires fewer primitive row operations than LLL. Crucially, it generalizes zero-shot to unseen moduli and higher dimensions up to $n=32$ without retraining.
翻译:Lenstra-Lenstra-Lovász(LLL)算法是计算机科学中用于格基约化的开创性贡献,但其多项式时间输出的基随着维度增加而远非最优。我们表明,深度强化学习可以通过与LLL的原始动作空间交互,发现严格更优且可泛化的约化策略。我们将格基约化形式化为单玩家马尔可夫决策过程(MDP),并使用基于AlphaZero风格的自博弈流水线训练深度残差网络,该流水线通过自适应地平线MCTS(蒙特卡洛树搜索)增强,耦合了多步网络预测与熵门控扩展机制。由此产生的策略DeltaStar仅在$8$维小规模$q$进制格上训练,且所需的原始行操作少于LLL。关键的是,它在无需重新训练的情况下,零样本泛化至未见模数和高达$n=32$的更高维度。