Decoupling Numerical and Structural Parameters: An Empirical Study on Adaptive Genetic Algorithms via Deep Reinforcement Learning for the Large-Scale TSP

翻译：数值与结构参数解耦：基于深度强化学习的自适应遗传算法在大规模旅行商问题中的实证研究

Hongyu Wang,Yuhan Jing,Yibing Shi,Enjin Zhou,Haotian Zhang,Jialong Shi

from arxiv, 6 pages, 8 figures, Accepted by WCCI-CEC Conference

Proper parameter configuration is a prerequisite for the success of Evolutionary Algorithms (EAs). While various adaptive strategies have been proposed, it remains an open question whether all control dimensions contribute equally to algorithmic scalability. To investigate this, we categorize control variables into numerical parameters (e.g., crossover and mutation rates) and structural parameters (e.g., population size and operator switching), hypothesizing that they play distinct roles. This paper presents an empirical study utilizing a dual-level Deep Reinforcement Learning (DRL) framework to decouple and analyze the impact of these two dimensions on the Traveling Salesman Problem (TSP). We employ a Recurrent PPO agent to dynamically regulate these parameters, treating the DRL model as a probe to reveal evolutionary dynamics. Experimental results confirm the effectiveness of this approach: the learned policies outperform static baselines, reducing the optimality gap by approximately 45% on the largest tested instance (rl5915). Building on this validated framework, our ablation analysis reveals a fundamental insight: while numerical tuning offers local refinement, structural plasticity is the decisive factor in preventing stagnation and facilitating escape from local optima. These findings suggest that future automated algorithm design should prioritize dynamic structural reconfiguration over fine-grained probability adjustment. To facilitate reproducibility, the source code is available at https://github.com/StarDream1314/DRLGA-TSP

翻译：合理的参数配置是进化算法（Evolutionary Algorithms, EAs）成功的前提。尽管已有多种自适应策略被提出，但所有控制维度是否对算法可扩展性具有同等贡献仍是一个开放问题。为探究此问题，我们将控制变量分为数值参数（如交叉率与变异率）和结构参数（如种群规模与算子切换），并提出假设：二者扮演不同角色。本文通过双层深度强化学习（Deep Reinforcement Learning, DRL）框架进行实证研究，以解耦并分析这两个维度对旅行商问题（Traveling Salesman Problem, TSP）的影响。我们采用循环PPO智能体动态调节这些参数，将DRL模型作为探针揭示进化动态。实验结果验证了该方法有效性：学习得到的策略优于静态基线，在最大测试实例（rl5915）上将最优性差距降低约45%。基于这一经过验证的框架，我们的消融分析揭示了一个根本性发现：数值调优仅提供局部改进，而结构可塑性则是防止早熟停滞、促进逃离局部最优的关键因素。这些结果表明，未来自动化算法设计应优先考虑动态结构重构而非细粒度概率调整。为促进可重复性，源代码已发布于 https://github.com/StarDream1314/DRLGA-TSP