Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the $\textbf{action chunk length}$ used during training, termed $\textbf{horizon}$. Our empirical study reveals an inherent trade-off: longer horizons provide stronger global foresight but degrade fine-grained accuracy, while shorter ones sharpen local control yet struggle on long-term tasks, implying fixed choice of single horizons being suboptimal. To mitigate the trade-off, we propose a $\textbf{mixture of horizons (MoH)}$ strategy. MoH rearranges the action chunk into several segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs with a light linear gate. It has three appealing benefits. 1) MoH exploits long-term foresight and short-term precision jointly within a single model, improving both performance and generalizability to complex tasks. 2) MoH is plug-and-play for full-attention action modules with minimal training or inference overhead. 3) MoH enables dynamic inference with adaptive horizons, which selects stable actions through cross-horizon consensus, achieving 2.5$\times$ higher throughput than baselines while preserving superior performance. Extensive experiments over flow-based policies $π_0$, $π_{0.5}$, and one-step regression policy $π_{\text{reg}}$ demonstrate that MoH yields consistent and significant gains on both simulations and real-world tasks. Notably, under mixed-task setting, $π_{0.5}$ with MoH reaches a new state-of-the-art with 99$\%$ average success rate on LIBERO after only $30k$ training iterations. Project page: https://timsty1.github.io/moh/
翻译:视觉-语言-动作(VLA)模型在机器人操作中展现了卓越能力,但其性能高度依赖于训练过程中使用的**动作块长度**,即**视野**。我们的实证研究揭示了一个固有折衷:长视野可提供更强的全局前瞻能力,但会降低细粒度精度;短视野则能提升局部控制精度,却难以胜任长期任务——这表明固定选择单一视野是次优方案。为缓解这一矛盾,我们提出**多视野混合策略(MoH)**。该策略将动作块重构为多个具有不同视野的片段,通过共享动作Transformer并行处理,并利用轻量线性门控融合输出。该方法具备三个显著优势:1) MoH可在单模型内协同利用长程前瞻与短程精度,同时提升复杂任务的性能与泛化能力;2) MoH可作为即插即用组件嵌入全注意力动作模块,训练与推理开销极低;3) MoH支持基于自适应视野的动态推理,通过跨视野一致性选择稳定动作,在保持优异性能的同时实现比基线方法高2.5倍的吞吐量。在基于流的策略(π₀、π₀.₅)和单步回归策略(π_reg)上的大量实验表明,MoH在仿真与真实世界任务中均能带来一致且显著的性能提升。值得注意的是,在混合任务设置下,配备MoH的π₀.₅模型仅需30k次训练迭代即在LIBERO基准上达到99%的平均成功率,创下新的最优性能记录。项目页面:https://timsty1.github.io/moh/