Machine Learning-based Two-Stage Graph Sparsification for the Travelling Salesman Problem

High-performance TSP solvers such as Lin-Kernighan-Helsgaun (LKH) search within a \emph{candidate graph} -- a small subset of edges pre-selected for the solver -- rather than over the complete graph. The two leading sparsification heuristics, $α$-Nearest and POPMUSIC, each fall short of the density-coverage balance: $α$-Nearest is dense with stable recall, while POPMUSIC is sparser but its recall degrades with scale. Their union closes the recall gap while remaining far below the complete graph in density, leaving room for further reduction. Existing learning-based sparsifiers score edges on the complete graph, an approach that is expensive and largely limited to Euclidean instances. We propose a two-stage method that inverts this logic. Stage~1 takes the union of $α$-Nearest and POPMUSIC, achieving near-perfect recall at ${\sim}6N$ edges. Crucially, the union annotates each edge with its \emph{source provenance} -- whether it was endorsed by $α$-Nearest, POPMUSIC, or both. Stage~2 trains a lightweight classifier on these annotated edges and prunes the lowest-scoring ones. Because dual-source edges are almost always optimal, the learning problem reduces to filtering the single-source subset -- a substantially easier task than classifying all $O(N^2)$ edges from scratch. Across four distance types, five spatial distributions, and problem sizes from 50 to 500, the pipeline reduces candidate-graph density by $37$-$47\%$ while retaining ${\geq}99.69\%$ of optimal-tour edges, and matches or exceeds the coverage of recent Euclidean-only neural sparsifiers at lower density at TSP500.

翻译：高性能TSP求解器（如Lin-Kernighan-Helsgaun, LKH）在候选图（即预先为求解器选定的一个小子集边）而非完整图上执行搜索。两种主流的稀疏化启发式方法，$α$-最近邻（$α$-Nearest）和POPMUSIC，均未能在密度与覆盖率之间取得平衡：$α$-最近邻密度高且召回率稳定，而POPMUSIC更稀疏但其召回率随规模增大而下降。将两者结合可在密度仍远低于完整图的前提下弥补召回率差距，为进一步压缩留下空间。现有基于学习的稀疏化方法在完整图上对所有边评分，该方式代价高昂且主要局限于欧几里得实例。我们提出一种两阶段方法，反转了这一逻辑。第一阶段取$α$-最近邻和POPMUSIC的并集，在约$6N$条边时达到近乎完美的召回率。关键在于，该并集为每条边标注了其来源归属——即该边是由$α$-最近邻、POPMUSIC还是两者共同推荐。第二阶段在此标注的边上训练一个轻量级分类器，并剪除评分最低的边。由于双源边几乎总是最优的，学习任务简化为过滤单源子集——这比从头分类所有$O(N^2)$条边容易得多。在四种距离类型、五种空间分布以及规模从50到500的问题上，该流程将候选图密度降低$37$-$47\%$，同时保留${\geq}99.69\%$的最优环边，并在TSP500上以更低密度达到或超越近期仅限欧几里得实例的神经稀疏化方法的覆盖率。