Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $π_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.
翻译:实时执行对于在物理世界中部署视觉-语言-动作(VLA)模型至关重要。现有异步推理方法主要优化轨迹平滑性,但忽略了响应环境变化的关键延迟。通过重新思考动作分块策略中的反应概念,本文系统分析了控制反应时间的因素。我们证明反应时间服从由首次动作时间(TTFA)和执行周期共同决定的均匀分布。此外,我们揭示出在基于流的VLA模型中应用恒定调度策略的低效性——该策略强制系统在动作开始前完成所有采样步骤,这构成了反应延迟的瓶颈。为解决此问题,我们提出即时反应的快速动作采样方法(FASTER)。通过引入周期感知调度,FASTER在流采样过程中自适应地优先考虑近期动作,将即时反应的去噪过程压缩至单步(例如在 π_{0.5} 和 X-VLA 中实现十倍压缩),同时保持长周期轨迹质量。结合流式客户端-服务器流水线,FASTER 显著降低了真实机器人上的有效反应延迟,在消费级GPU上尤为突出。包含高度动态的乒乓球任务在内的真实世界实验证明,FASTER 为通用策略解锁了前所未有的实时响应能力,能够快速生成精确且平滑的轨迹。