Vision-language-action (VLA) models are powerful action generators for robot manipulation, but they are typically executed with fixed inference and replanning schedules. This rigidity ignores the uneven difficulty of robot control: contact-rich or uncertain states may need more computation and fresher feedback, while easier states can often be handled with fewer inference steps and longer open-loop execution. We propose Elastic Queries Reinforcement Learning (EQRL), a framework that makes each VLA policy query elastic. A lightweight latent-schedule adaptor jointly selects the latent input, denoising budget, and action chunk length, without fine-tuning the underlying VLA model. To make scheduling difficulty-aware, EQRL trains a critic over the joint latent-schedule action and derives a state difficulty signal from critic ensemble disagreement. This signal guides compute toward difficult states, while a learned residual allows task-driven correction. We formulate variable chunk execution as query-level macro-action RL with chunk-dependent discounting and an amortized number-of-function-evaluations (NFE) budget. Across simulation and real-robot manipulation, EQRL reduces amortized inference cost while preserving or improving task success.
翻译:视觉-语言-动作模型是机器人操作中强大的动作生成器,但通常以固定的推理和重规划时间表执行。这种刚性忽略了机器人控制的不均匀难度:接触密集或不确定状态可能需要更多计算和更新鲜的反馈,而简单状态通常可以用更少的推理步骤和更长的开环执行来处理。我们提出弹性查询强化学习,一种使每个VLA策略查询变得弹性的框架。轻量级潜在调度适配器联合选择潜在输入、去噪预算和动作块长度,无需微调底层VLA模型。为使调度具有难度感知能力,EQRL在联合潜在调度动作上训练一个评价器,并从评价器集成差异中导出状态难度信号。该信号将计算引导至困难状态,同时学习到的残差允许任务驱动校正。我们将变长块执行形式化为查询级宏动作强化学习,包含块相关的折扣因子和摊销的函数评估次数预算。在仿真和真实机器人操作中,EQRL在保持或提升任务成功率的同时降低了摊销推理成本。