This paper gives a detailed review of reinforcement learning (RL) in combinatorial optimization, introduces the history of combinatorial optimization starting in the 1950s, and compares it with the RL algorithms of recent years. This paper explicitly looks at a famous combinatorial problem-traveling salesperson problem (TSP). It compares the approach of modern RL algorithms for the TSP with an approach published in the 1970s. By comparing the similarities and variances between these methodologies, the paper demonstrates how RL algorithms are optimized due to the evolution of machine learning techniques and computing power. The paper then briefly introduces the deep learning approach to the TSP named deep RL, which is an extension of the traditional mathematical framework. In deep RL, attention and feature encoding mechanisms are introduced to generate near-optimal solutions. The survey shows that integrating the deep learning mechanism, such as attention with RL, can effectively approximate the TSP. The paper also argues that deep learning could be a generic approach that can be integrated with any traditional RL algorithm to enhance the outcomes of the TSP.
翻译:本文详细回顾了强化学习在组合优化中的应用,介绍了自20世纪50年代开始的组合优化发展历史,并将其与近年来的强化学习算法进行了比较。本文特别聚焦于著名的组合问题——旅行商问题(TSP),比较了现代强化学习算法与20世纪70年代提出的方法在该问题上的处理方式。通过分析这些方法之间的相似性与差异性,本文展示了强化学习算法如何因机器学习技术与计算能力的演进而得到优化。随后,本文简要介绍了针对TSP的深度学习方法,即深度强化学习,这是传统数学框架的扩展。在深度强化学习中,引入了注意力机制和特征编码机制以生成近似最优解。本综述表明,将深度学习机制(如注意力机制)与强化学习相结合能够有效逼近TSP。本文还论证了深度学习可作为一种通用方法,与任何传统强化学习算法集成,从而增强TSP问题的求解效果。