The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training set
翻译:在多个抽象层次上规划行动的能力使智能体能够有效解决复杂任务。然而,从演示中学习低层次和高层次规划模型仍颇具挑战,尤其是在处理高维输入时。为解决此问题,我们提出利用强化学习来识别专家轨迹中的子目标,通过将奖励大小与给定状态及所选子目标下低层动作的可预测性相关联。我们为识别出的子目标构建了一种向量量化生成模型,用于执行子目标层面的规划。实验表明,该算法在解决复杂、长周期决策问题方面表现优异,超越了现有最优方法。凭借其规划能力,我们的算法能够找到比训练集中更优的轨迹。