Recent years have witnessed a surge in research on machine learning for combinatorial optimization since learning-based approaches can outperform traditional heuristics and approximate exact solvers at a lower computation cost. However, most existing work on supervised neural combinatorial optimization focuses on TSP instances with a fixed number of cities and requires large amounts of training samples to achieve a good performance, making them less practical to be applied to realistic optimization scenarios. This work aims to develop a data-driven graph representation learning method for solving travelling salesman problems (TSPs) with various numbers of cities. To this end, we propose an edge-aware graph autoencoder (EdgeGAE) model that can learn to solve TSPs after being trained on solution data of various sizes with an imbalanced distribution. We formulate the TSP as a link prediction task on sparse connected graphs. A residual gated encoder is trained to learn latent edge embeddings, followed by an edge-centered decoder to output link predictions in an end-to-end manner. To improve the model's generalization capability of solving large-scale problems, we introduce an active sampling strategy into the training process. In addition, we generate a benchmark dataset containing 50,000 TSP instances with a size from 50 to 500 cities, following an extremely scale-imbalanced distribution, making it ideal for investigating the model's performance for practical applications. We conduct experiments using different amounts of training data with various scales, and the experimental results demonstrate that the proposed data-driven approach achieves a highly competitive performance among state-of-the-art learning-based methods for solving TSPs.
翻译:近年来,基于机器学习的方法在组合优化领域的研究激增,因为这类方法能以更低的计算成本超越传统启发式算法和近似精确求解器。然而,现有监督式神经组合优化研究大多聚焦于固定城市数量的旅行商问题实例,且需要大量训练样本才能获得良好性能,这限制了其在现实优化场景中的实用性。本研究旨在开发一种数据驱动的图表示学习方法,用于求解具有不同城市数量的旅行商问题。为此,我们提出了一种边缘感知图自编码器模型,该模型可在不同规模且分布不平衡的求解数据上训练后,学会求解旅行商问题。我们将旅行商问题表述为稀疏连接图上的链接预测任务,训练残差门控编码器来学习潜在边缘嵌入,并通过边缘中心解码器以端到端方式输出链接预测结果。为提升模型求解大规模问题的泛化能力,我们在训练过程中引入主动采样策略。此外,我们生成了一个包含50,000个旅行商问题实例的基准数据集,实例规模从50个到500个城市不等,且遵循极度尺度不平衡分布,这为研究模型在实际应用中的性能提供了理想条件。我们使用不同规模、不同数量的训练数据进行了实验,结果表明,所提出的数据驱动方法在求解旅行商问题方面,与最先进的基于学习的方法相比,达到了极具竞争力的性能。