We present the task of "Social Rearrangement", consisting of cooperative everyday tasks like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi-agent environment. In Social Rearrangement, two robots coordinate to complete a long-horizon task, using onboard sensing and egocentric observations, and no privileged information about the environment. We study zero-shot coordination (ZSC) in this task, where an agent collaborates with a new partner, emulating a scenario where a robot collaborates with a new human partner. Prior ZSC approaches struggle to generalize in our complex and visually rich setting, and on further analysis, we find that they fail to generate diverse coordination behaviors at training time. To counter this, we propose Behavior Diversity Play (BDP), a novel ZSC approach that encourages diversity through a discriminability objective. Our results demonstrate that BDP learns adaptive agents that can tackle visual coordination, and zero-shot generalize to new partners in unseen environments, achieving 35% higher success and 32% higher efficiency compared to baselines.
翻译:我们提出了“社交重排”任务,该任务包含诸如布置餐桌、整理房屋或拆包杂货等协作性日常活动,并基于模拟的多智能体环境进行。在社交重排任务中,两个机器人需利用机载感知与第一人称观测,在不掌握环境特权信息的前提下协同完成长周期任务。我们研究了该任务中的零样本协调机制,即智能体与陌生合作伙伴进行协作的场景(模拟机器人与人类新伙伴的合作)。现有零样本协调方法难以在此类复杂且视觉丰富的环境中泛化,进一步分析发现,这些方法在训练阶段无法生成多样化的协调行为。针对这一问题,我们提出行为多样性博弈策略,一种通过可区分性目标函数鼓励行为多样性的零样本协调新方法。实验结果表明,该策略能训练出可应对视觉协调的自适应智能体,在未见环境中实现对陌生合作伙伴的零样本泛化,相较于基线方法成功率提升35%,效率提升32%。