Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimodal Fused Learning (MMFL) framework that leverages both graph and image-based representations to capture complementary aspects of the problem, and learns a policy capable of generating high-quality task planning schemes in real time. Specifically, we first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations. We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module with dedicated bottlenecks that enables effective integration of geometric and spatial features. Extensive experiments show that our MMFL approach significantly outperforms state-of-the-art methods across various GTSP instances while maintaining the computational efficiency required for real-time robotic applications. Physical robot tests further validate its practical effectiveness in real-world scenarios.
翻译:高效且有效的任务规划对于移动机器人至关重要,特别是在仓库拣选和环境监测等应用场景中。这些任务通常需要从多个目标簇中各选择一个位置,构成广义旅行商问题(Generalized Traveling Salesman Problem, GTSP),该问题在准确性和效率上的求解仍具有挑战性。为此,我们提出一种多模态融合学习(Multimodal Fused Learning, MMFL)框架,利用基于图与基于图像的表示来捕捉问题的互补方面,并学习能够实时生成高质量任务规划方案的策略。具体而言,我们首先引入一种基于坐标的图像构建器,将GTSP实例转化为具有空间信息表征的表示。随后,我们设计了一种自适应分辨率缩放策略以增强对不同问题规模的适应性,并开发了一个包含专用瓶颈的多模态融合模块,实现了几何特征与空间特征的有效整合。大量实验表明,我们的MMFL方法在各种GTSP实例上显著优于现有最优方法,同时保持了实时机器人应用所需的计算效率。物理机器人测试进一步验证了其在真实场景中的实际有效性。