Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task. In this work, we propose a language-conditioned semantic search-based method to produce an online search-based policy from the available demonstration dataset of state-action trajectories. Here we directly acquire actions from the most similar manipulation trajectories found in the dataset. Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities. This holds great potential for expanding the use of our online search-based policy approach to tasks typically addressed by Imitation Learning or Reinforcement Learning-based policies.
翻译:强化学习和模仿学习方法所采用的策略学习策略难以仅凭少量任务示例实现良好泛化。本文提出一种基于语言条件的语义搜索方法,从可用的状态-动作轨迹演示数据集中生成在线搜索策略。该方法直接获取数据集中最相似操作轨迹的动作。我们的方法在CALVIN基准测试中超越基线性能,并展现出强大的零样本适应能力。这为将在线搜索策略方法拓展至通常由模仿学习或基于强化学习的策略处理的任务提供了巨大潜力。