Modern imitation learning methods, including visuomotor and Vision-Language-Action (VLA) policies, typically output high-level action references that are executed by low-level controllers. However, the absence of higher-order reference signals, together with the policy's lack of awareness of the underlying low-level control dynamics during training, inevitably induces an execution gap. As a result, realized actions deviate systematically from policy-commanded ones, with a critical impact on precision-sensitive manipulation. Prior work either modifies the policy architecture or the low-level controller, both requiring intrusive changes to the pretrained policy or packaged controller. This raises a natural question: when the policy and controller are both treated as inaccessible black boxes, can we bridge the execution gap? We propose Adaptive Policy Execution (APEX), a plug-and-play framework inserted between the policy and the controller that reconstructs a dynamically feasible reference from policy outputs and adapts at test-time according to low-level state feedback, with a provable convergence guarantee. Extensive empirical studies show that APEX reduces controller-induced tracking error by 41.2% on demonstration replay and improves manipulation success by 4.8--25.8 percentage points across four visuomotor and VLA policy classes.
翻译:现代模仿学习方法(包括视觉-运动策略和视觉-语言-动作策略)通常输出由底层控制器执行的高层动作参考。然而,高阶参考信号的缺失,加上策略在训练过程中对底层控制动态缺乏感知,不可避免地导致了执行间隙。因此,实际动作会系统性地偏离策略指令动作,对精度敏感的操作产生关键影响。先前的工作要么修改策略架构,要么修改底层控制器,两者都需要对预训练策略或封装控制器进行侵入式更改。这自然引发了一个问题:当策略和控制器都被视为无法访问的黑箱时,我们能否弥合执行间隙?我们提出**自适应策略执行**(APEX),这是一种即插即用的框架,插入在策略和控制器之间,从策略输出中重建动态可行的参考,并在测试时根据底层状态反馈进行自适应,且具有可证明的收敛保证。广泛的实证研究表明,APEX在演示重放中将控制器引起的跟踪误差降低了41.2%,并在四种视觉-运动策略和视觉-语言-动作策略类别上将操作成功率提高了4.8至25.8个百分点。