The Team Orienteering Problem (TOP) generalizes many real-world multi-agent scheduling and routing tasks that occur in autonomous mobility, aerial logistics, and surveillance applications. While many flavors of the TOP exist for planning in multi-agent systems, they assume that all the agents cooperate toward a single objective; therefore, they do not extend to settings when they compete in reward-scarce environments. We propose Stochastic Prize-Collecting Orienteering Games (SPCOG) as an extension of the TOP to plan in the presence of self-interested agents operating on a graph, under energy constraints and stochastic transitions. A theoretical discussion on complete and star graphs establishes that there is a unique pure Nash equilibrium in SPCOGs that coincides with the optimal routing solution of an equivalent TOP under rank-based conflict resolution. We propose the concept of Ordinal Rank (OR) as a concise representation of an agents' global rank and its location within a topological, well-defined neighborhood. Empirical evaluations conducted on real-world, road-network graphs under both dynamic and stationary prize distributions show that in parameter-sharing settings, the policies that leverage local information can outperform those policies leverage global information when the former is conditioned on the OR rather than the global rank, indicating that the OR acts as a strong inductive bias in multi-agent games on graphs. The OR-conditioned policies also generalize much better to games with large number of agents compared to global-rank conditioned policies. Finally, we also propose we propose Fictitious Ordinal Response Learning (FORL) as an entropy-regulated algorithm to obtain convergent policies in independent-learning settings in prize-collecting games on graphs.
翻译:团队定向问题(TOP)概括了自动驾驶、空中物流和监控应用中出现的许多现实世界多智能体调度与路由任务。尽管多智能体系统的规划中已有多种TOP变体,但它们假设所有智能体协同实现单一目标,因此无法扩展到竞争性奖品稀缺环境。我们提出随机奖品收集定向博弈(SPCOG)作为TOP的扩展,用于规划在能量约束和随机转移条件下,运行在图结构上且具有自利动机的智能体。关于完全图和星图的理论讨论表明,SPCOG中存在唯一的纯纳什均衡,该均衡在基于排序的冲突解决机制下与等价TOP的最优路由解一致。我们提出序数排序(OR)概念,作为智能体全局排序及其在拓扑明确定义邻域内位置的简洁表征。在真实道路网络图上进行的动态与静态奖品分布实证评估表明,参数共享场景中,利用局部信息的策略在基于OR而非全局排序条件化时,可优于利用全局信息的策略,这表明OR在图结构多智能体博弈中起到强归纳偏置作用。与全局排序条件化策略相比,OR条件化策略在大量智能体的博弈中泛化能力更强。最后,我们提出虚构序数响应学习(FORL)作为熵正则化算法,用于在图结构奖品收集博弈的独立学习场景中获得收敛策略。