Robotic imitation learning faces a fundamental trade-off between modeling long-horizon dependencies and enabling fine-grained closed-loop control. Existing fixed-frequency action chunking approaches struggle to achieve both. Building on this insight, we propose HiPolicy, a hierarchical multi-frequency action chunking framework that jointly predicts action sequences at different frequencies to capture both coarse high-level plans and precise reactive motions. We extract and fuse hierarchical features from history observations aligned to each frequency for multi-frequency chunk generation, and introduce an entropy-guided execution mechanism that adaptively balances long-horizon planning with fine-grained control based on action uncertainty. Experiments on diverse simulated benchmarks and real-world manipulation tasks show that HiPolicy can be seamlessly integrated into existing 2D and 3D generative policies, delivering consistent improvements in performance while significantly enhancing execution efficiency.
翻译:机器人模仿学习面临着一个基本权衡:如何同时实现长时程依赖建模与细粒度闭环控制。现有固定频率的动作分块方法难以兼顾两者。基于这一洞察,我们提出HiPolicy——一种分层多频动作分块框架,该框架通过联合预测不同频率下的动作序列,既能捕获粗粒度的顶层规划,又能实现精确的反应式运动。我们从与各频率对齐的历史观测中提取并融合分层特征,以生成多频动作块;同时引入基于熵的执行机制,根据动作不确定性自适应平衡长时程规划与细粒度控制。在多样化的模拟基准测试和真实世界操作任务上的实验表明,HiPolicy可无缝集成至现有2D和3D生成式策略中,在显著提升执行效率的同时保持性能的持续改进。