Challenges in real-world robotic applications often stem from managing multiple, dynamically varying entities such as neighboring robots, manipulable objects, and navigation goals. Existing multi-agent control strategies face scalability limitations, struggling to handle arbitrary numbers of entities. Additionally, they often rely on engineered heuristics for assigning entities among agents. We propose a data driven approach to address these limitations by introducing a decentralized control system using neural network policies trained in simulation. Leveraging permutation invariant neural network architectures and model-free reinforcement learning, our approach allows control agents to autonomously determine the relative importance of different entities without being biased by ordering or limited by a fixed capacity. We validate our approach through both simulations and real-world experiments involving multiple wheeled-legged quadrupedal robots, demonstrating their collaborative control capabilities. We prove the effectiveness of our architectural choice through experiments with three exemplary multi-entity problems. Our analysis underscores the pivotal role of the end-to-end trained permutation invariant encoders in achieving scalability and improving the task performance in multi-object manipulation or multi-goal navigation problems. The adaptability of our policy is further evidenced by its ability to manage varying numbers of entities in a zero-shot manner, showcasing near-optimal autonomous task distribution and collision avoidance behaviors.
翻译:现实世界机器人应用中的挑战往往源于管理多个动态变化的实体,例如邻接机器人、可操控物体和导航目标。现有的多智能体控制策略面临可扩展性限制,难以处理任意数量的实体。此外,这些策略通常依赖工程化的启发式方法在智能体间分配实体。我们提出一种数据驱动方法来解决这些局限,通过引入一个基于仿真训练神经网络策略的分散控制系统。利用置换不变神经网络架构和无模型强化学习,我们的方法使控制智能体能够自主确定不同实体的相对重要性,且不受顺序偏差或固定容量限制。我们通过涉及多个轮腿式四足机器人的仿真与真实世界实验验证了该方法的有效性,展示了其协作控制能力。通过三个典型多实体问题的实验,我们证明了架构选择的优势。我们的分析强调了端到端训练的置换不变编码器在实现可扩展性、提升多物体操控或多目标导航任务性能中的关键作用。策略的适应性还体现在其能以零样本方式管理可变数量实体,展现出近乎最优的自主任务分配和碰撞避免行为。