Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.
翻译:扩散模型与流匹配模型已成为强大的机器人策略,使视觉-语言-动作(VLA)模型能够跨多样场景和指令进行泛化。然而,当通过模仿学习训练时,其高生成能力使其对人类演示中的噪声(如急停、停顿和抖动)敏感,这些噪声会降低动作一致性。动作一致性的降低会导致部署过程中的不稳定性和轨迹漂移——在精度至关重要的精细操作中,这类失效是灾难性的。本文提出针对VLA模型的“动作一致性引导(ACG)”,一种无需训练、测试时实施的引导算法,可提升动作一致性从而实现性能增益。在RoboCasa、DexMimicGen及真实世界的SO-101任务上的评估表明,ACG在多样化操作任务中持续提升动作一致性并提高成功率。代码与项目页面分别见:https://github.com/DAVIAN-Robotics/ACG 及 https://DAVIAN-Robotics.github.io/ACG 。