Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discrete flow matching VLA for iterative refinement of action tokens. DFM-VLA~models a token-level probability velocity field that dynamically updates the full action sequence across refinement iterations. We investigate two ways to construct the velocity field: an auxiliary velocity-head formulation and an action-embedding-guided formulation. Our framework further adopts a two-stage decoding strategy with an iterative refinement stage followed by deterministic validation for stable convergence. Extensive experiments on CALVIN, LIBERO, and real-world manipulation tasks show that DFM-VLA consistently outperforms strong autoregressive, discrete diffusion, and continuous diffusion baselines in manipulation performance while retaining high inference efficiency. In particular, DFM-VLA achieves an average success length of 4.44 on CALVIN and an average success rate of 95.7\% on LIBERO, highlighting the value of action refinement via discrete flow matching for robotic manipulation. Our project is available https://chris1220313648.github.io/DFM-VLA/
翻译:视觉-语言-动作(VLA)模型采用离散分词方案编码动作,在机器人操作领域应用日益广泛,但现有解码范式仍存在根本性局限。无论是通过自回归VLA顺序解码动作,还是通过离散扩散VLA并行解码,一旦动作词元生成,通常便固定不变且无法在后续迭代中修改,因此早期词元错误无法得到有效纠正。我们提出DFM-VLA,一种基于离散流匹配的VLA模型,用于动作词元的迭代精炼。DFM-VLA建模了一种词元级概率速度场,该场可在多次精炼迭代中动态更新完整动作序列。我们研究了构建速度场的两种方案:辅助速度头公式与动作嵌入引导公式。本框架进一步采用两阶段解码策略,即先进行迭代精炼,后进行确定性验证,以实现稳定收敛。在CALVIN、LIBERO及真实世界操作任务上的大量实验表明,DFM-VLA在保持高推理效率的同时,在操作性能上持续优于强自回归、离散扩散及连续扩散基线模型。特别地,DFM-VLA在CALVIN上达到了4.44的平均成功时长,在LIBERO上达到了95.7%的平均成功率,凸显了基于离散流匹配的动作精炼对机器人操作的价值。项目页面:https://chris1220313648.github.io/DFM-VLA/