Neural Combinatorial Optimization has been researched actively in the last eight years. Even though many of the proposed Machine Learning based approaches are compared on the same datasets, the evaluation protocol exhibits essential flaws and the selection of baselines often neglects State-of-the-Art Operations Research approaches. To improve on both of these shortcomings, we propose the Routing Arena, a benchmark suite for Routing Problems that provides a seamless integration of consistent evaluation and the provision of baselines and benchmarks prevalent in the Machine Learning- and Operations Research field. The proposed evaluation protocol considers the two most important evaluation cases for different applications: First, the solution quality for an a priori fixed time budget and secondly the anytime performance of the respective methods. By setting the solution trajectory in perspective to a Best Known Solution and a Base Solver's solutions trajectory, we furthermore propose the Weighted Relative Average Performance (WRAP), a novel evaluation metric that quantifies the often claimed runtime efficiency of Neural Routing Solvers. A comprehensive first experimental evaluation demonstrates that the most recent Operations Research solvers generate state-of-the-art results in terms of solution quality and runtime efficiency when it comes to the vehicle routing problem. Nevertheless, some findings highlight the advantages of neural approaches and motivate a shift in how neural solvers should be conceptualized.
翻译:神经组合优化在过去八年中一直受到积极研究。尽管许多基于机器学习的方法在同一数据集上进行了比较,但评估协议存在本质缺陷,且基线选择往往忽略了最先进的运筹学方法。为改进这两个缺陷,我们提出了路由竞技场,这是一个针对路由问题的基准测试套件,它无缝集成了机器学习与运筹学领域的一致评估流程及基线与基准的提供。所提出的评估协议考虑了不同应用中最关键的两种评估场景:首先,针对预设固定时间预算的求解质量;其次,各方法在任意时间点的性能表现。通过将求解轨迹置于最佳已知解与基础求解器求解轨迹的基准视角下,我们进一步提出了加权相对平均性能(WRAP)——一项能够量化神经路由求解器常被宣称的运行时效率的新型评估指标。首次综合实验评估表明,针对车辆路径问题,最新的运筹学求解器在求解质量与运行时效率方面均取得了最先进的结果。然而,部分发现揭示了神经方法的优势,并激励了神经求解器概念化方式的转变。