Traditional approaches to the design of multi-agent navigation algorithms consider the environment as a fixed constraint, despite the influence of spatial constraints on agents' performance. Yet hand-designing conducive environment layouts is inefficient and potentially expensive. The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem, where both agent performance and environment cost are incorporated. Towards this end, we propose novel problems of unprioritized and prioritized environment optimization, where the former considers agents unbiasedly and the latter accounts for agent priorities. We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness (i.e., all agents reach goals), and analyze the role of agent priorities in the environment optimization. We proceed to impose real-world constraints on the environment optimization and formulate it mathematically as a constrained stochastic optimization problem. Since the relation between agents, environment and performance is challenging to model, we leverage reinforcement learning to develop a model-free solution and a primal-dual mechanism to handle constraints. Distinct information processing architectures are integrated for various implementation scenarios, including online/offline optimization and discrete/continuous environment. Numerical results corroborate the theory and demonstrate the validity and adaptability of our approach.
翻译:传统多智能体导航算法设计方法将环境视为固定约束,忽略了空间约束对智能体性能的影响。然而,手动设计有利的环境布局既低效又成本高昂。本文旨在将环境作为系统级优化问题中的决策变量,同时整合智能体性能与环境成本。为此,我们提出了无优先级和优先级环境优化两类新问题,前者无偏地考虑所有智能体,后者则关注智能体优先级。通过形式化证明,我们揭示了环境在保证完备性(即所有智能体均能到达目标点)条件下可发生改变的充分必要条件,并分析了智能体优先级在环境优化中的作用。进一步,我们对环境优化施加现实约束,并将其形式化为约束随机优化问题。考虑到智能体、环境与性能之间复杂的建模关系,我们利用强化学习开发无模型解决方案,并引入原始-对偶机制处理约束。针对在线/离线优化及离散/连续环境等不同实施场景,整合了差异化的信息处理架构。数值结果验证了理论分析,并证明了本方法的有效性与适应性。