Visual planning methods are promising to handle complex settings where extracting the system state is challenging. However, none of the existing works tackles the case of multiple heterogeneous agents which are characterized by different capabilities and/or embodiment. In this work, we propose a method to realize visual action planning in multi-agent settings by exploiting a roadmap built in a low-dimensional structured latent space and used for planning. To enable multi-agent settings, we infer possible parallel actions from a dataset composed of tuples associated with individual actions. Next, we evaluate feasibility and cost of them based on the capabilities of the multi-agent system and endow the roadmap with this information, building a capability latent space roadmap (C-LSR). Additionally, a capability suggestion strategy is designed to inform the human operator about possible missing capabilities when no paths are found. The approach is validated in a simulated burger cooking task and a real-world box packing task.
翻译:视觉规划方法在处理系统状态难以提取的复杂场景中具有重要前景。然而,现有工作均未涉及具有不同能力和/或具身特征的多异构体智能体场景。本文提出一种通过构建低维结构化潜在空间中的路径图进行规划的方法,以实现多智能体环境中的视觉动作规划。为支持多智能体场景,我们从由个体动作元组组成的数据集中推断可能的并行动作,进而基于多智能体系统的能力评估其可行性与代价,并将此信息赋予路径图,构建能力潜在空间路径图(C-LSR)。此外,我们设计了能力建议策略,在无可行路径时向操作员提示可能缺失的能力。该方法在模拟汉堡烹饪任务和真实世界装箱任务中得到了验证。