Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distribution trajectories upon deployment. We analyze why policies latch onto these spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training, which grows exponentially with horizon. Existing regularization techniques provide inconsistent benefits across tasks, as they do not fundamentally address this coverage problem. Motivated by these findings, we propose Big Picture Policies (BPP), an approach that conditions on a minimal set of meaningful keyframes detected by a vision-language model. By projecting diverse rollouts onto a compact set of task-relevant events, BPP substantially reduces distribution shift between training and deployment, without sacrificing expressivity. We evaluate BPP on four challenging real-world manipulation tasks and three simulation tasks, all requiring history conditioning. BPP achieves 70% higher success rates than the best comparison on real-world evaluations.
翻译:许多机器人任务需要关注过往观测的历史。例如,在房间内寻找物品需要记住哪些区域已被搜索过。然而,性能最佳的机器人策略通常仅以当前观测为条件,这限制了它们在此类任务中的适用性。简单地将过往观测作为条件常因伪相关性而失败:策略会依赖于训练历史中的偶然特征,这些特征在部署时无法泛化到分布外的轨迹。我们分析了策略为何会陷入这些伪相关性,发现该问题源于训练期间对可能历史空间的覆盖有限,而该空间随任务时域呈指数级增长。现有的正则化技术在不同任务中提供的效益不一致,因为它们并未从根本上解决这种覆盖问题。基于这些发现,我们提出了宏观策略(BPP),该方法通过视觉-语言模型检测到的一组最小化有意义关键帧进行条件化。通过将多样化的轨迹投影到紧凑的任务相关事件集合上,BPP显著减少了训练与部署之间的分布偏移,同时不牺牲表达能力。我们在四个需要历史条件化的真实世界操作任务和三个仿真任务上评估了BPP。在真实世界评估中,BPP实现了比最佳对比方法高70%的成功率。