Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distribution trajectories upon deployment. We analyze why policies latch onto these spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training, which grows exponentially with horizon. Existing regularization techniques provide inconsistent benefits across tasks, as they do not fundamentally address this coverage problem. Motivated by these findings, we propose Big Picture Policies (BPP), an approach that conditions on a minimal set of meaningful keyframes detected by a vision-language model. By projecting diverse rollouts onto a compact set of task-relevant events, BPP substantially reduces distribution shift between training and deployment, without sacrificing expressivity. We evaluate BPP on four challenging real-world manipulation tasks and three simulation tasks, all requiring history conditioning. BPP achieves 70% higher success rates than the best comparison on real-world evaluations. Videos are available at https://bigpicturepolicies.github.io/
翻译:许多机器人任务需要关注过往观测的历史。例如,在房间内寻找物品需要记住哪些区域已被搜索过。然而,性能最优的机器人策略通常仅以当前观测为条件,这限制了其在此类任务中的适用性。简单地将过往观测作为条件常因伪相关性而失效:策略会过度依赖训练历史中的偶然特征,这些特征在部署时无法泛化到分布外的轨迹。我们分析了策略为何会陷入这些伪相关性,发现该问题源于训练期间对可能历史空间的覆盖有限——该空间随任务时域呈指数级增长。现有正则化技术在不同任务中效果不一致,因其未从根本上解决覆盖不足的问题。基于这些发现,我们提出大图景策略(BPP),该方法通过视觉-语言模型检测到的、最小化的有意义关键帧序列构建条件。通过将多样化的轨迹投影到紧凑的任务相关事件集合上,BPP在保持表达力的同时,显著降低了训练与部署间的分布偏移。我们在四个需要历史依赖的真实世界操控任务和三个仿真任务上评估BPP。在真实世界评估中,BPP比最优对比方法实现了70%更高的成功率。演示视频详见 https://bigpicturepolicies.github.io/