Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video PreTraining model. We compare our model to state-of-the-art Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach is comparable to trained models, while allowing zero-shot task adaptation by changing the demonstration examples.
翻译:行为克隆利用演示数据集来学习行为策略。为克服学习和策略适应中的各类问题,我们提出使用潜空间对演示数据集建立索引,即时访问相似的相关经验,并从这些情境中复制行为。代理可执行所选相似情境中的动作,直至当前情境的表征与所选经验在潜空间中发生偏离。由此,我们将控制问题表述为对专家演示数据集的搜索问题。我们在Video PreTraining模型的潜在表征上,基于BASALT MineRL数据集测试了该方法,并将模型与当前最先进的Minecraft代理进行对比。实验表明,我们的方法能有效提取有意义的演示,并使代理在Minecraft环境中的多种场景下展现类似人类的行为。实验结果显示,虽然我们的搜索方法性能与训练模型相当,但通过更换演示示例即可实现零样本任务适应。