Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
翻译:我们的目标是构建能够解决诸如《我的世界》环境中任务的自主智能体。为此,我们采用基于模仿学习的方法。我们将控制问题形式化为对专家演示数据集上的搜索问题,其中智能体从相似的图像-动作对演示轨迹中复制动作。我们在视频预训练模型的潜在表示层中对BASALT MineRL数据集进行邻近搜索。只要智能体状态表示与数据集中所选专家轨迹的状态表示之间的偏差未超出范围,智能体就会持续复制专家轨迹中的动作。随后重复进行邻近搜索。我们的方法能够有效恢复有意义的演示轨迹,并在《我的世界》环境中展现类人行为。