组合式世界模型：面向具身多智能体协作的组合式世界模型 (COMBO: Compositional World Models for Embodied Multi-Agent Cooperation)

In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only egocentric views of the world. To effectively plan in this setting, in contrast to learning world dynamics in a single-agent scenario, we must simulate world dynamics conditioned on an arbitrary number of agents' actions given only partial egocentric visual observations of the world. To address this issue of partial observability, we first train generative models to estimate the overall world state given partial egocentric observations. To enable accurate simulation of multiple sets of actions on this world state, we then propose to learn a compositional world model for multi-agent cooperation by factorizing the naturally composable joint actions of multiple agents and compositionally generating the video conditioned on the world state. By leveraging this compositional world model, in combination with Vision Language Models to infer the actions of other agents, we can use a tree search procedure to integrate these modules and facilitate online cooperative planning. We evaluate our methods on three challenging benchmarks with 2-4 agents. The results show our compositional world model is effective and the framework enables the embodied agents to cooperate efficiently with different agents across various tasks and an arbitrary number of agents, showing the promising future of our proposed methods. More videos can be found at https://embodied-agi.cs.umass.edu/combo/.

翻译：本文研究了具身多智能体协作问题，其中分散的智能体仅能基于自身视角的世界观进行协作。与单智能体场景中学习世界动态不同，为了在此设定下进行有效规划，我们必须仅基于局部自我中心视觉观测，模拟以任意数量智能体动作为条件的世界动态。为解决这种部分可观测性问题，我们首先训练生成模型，以基于局部自我中心观测估计整体世界状态。为了能够准确模拟多组动作对该世界状态的影响，我们随后提出通过学习组合式世界模型来实现多智能体协作，其核心在于分解多智能体天然可组合的联合动作，并以世界状态为条件组合生成视频。通过利用该组合式世界模型，并结合视觉语言模型推断其他智能体的动作，我们可以采用树搜索流程整合这些模块，实现在线协作规划。我们在包含2-4个智能体的三个具有挑战性的基准测试中评估了我们的方法。结果表明，我们的组合式世界模型具有显著效果，该框架能使具身智能体在不同任务中与各类智能体高效协作，且能适应任意数量的智能体，展现了所提方法的广阔前景。更多视频请访问 https://embodied-agi.cs.umass.edu/combo/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日