This paper proposes a novel approach to address the challenge that pretrained VLA models often fail to effectively improve performance and reduce adaptation costs during standard supervised finetuning (SFT). Some advanced finetuning methods with auxiliary training objectives can improve performance and reduce the number of convergence steps. However, they typically incur significant computational overhead due to the additional losses from auxiliary tasks. To simultaneously achieve the enhanced capabilities of auxiliary training with the simplicity of standard SFT, we decouple the two objectives of auxiliary task training within the parameter space, namely, enhancing general capabilities and fitting task-specific action distributions. To deliver this goal, we only need to train the model to converge on a small-scale task set using two distinct training strategies. The difference between the resulting model parameters can then be interpreted as capability vectors provided by auxiliary tasks. These vectors are then merged with pretrained parameters to form a capability-enhanced meta model. Moreover, when standard SFT is augmented with a lightweight orthogonal regularization loss, the merged model attains performance comparable to auxiliary finetuned baselines with reduced computational overhead. Experimental results demonstrate that this approach is highly effective across diverse robot tasks. Project page: https://chris1220313648.github.io/Fast-dVLA/
翻译:本文提出了一种新方法,旨在解决预训练VLA模型在标准监督微调(SFT)中往往无法有效提升性能并降低适应成本的挑战。部分采用辅助训练目标的高级微调方法能够提升性能并减少收敛所需的步骤数,但由于辅助任务带来的额外损失,它们通常会产生显著的计算开销。为了同时实现辅助训练增强的能力与标准SFT的简洁性,我们将辅助任务训练的两个目标在参数空间内解耦,即增强通用能力与拟合任务特定动作分布。为实现这一目标,我们仅需使用两种不同的训练策略,让模型在小型任务集上收敛。所得模型参数之间的差异可被解释为辅助任务提供的能力向量。随后,将这些向量与预训练参数合并,形成能力增强的元模型。此外,当标准SFT辅以轻量级正交正则化损失时,合并后的模型能够在降低计算开销的同时,达到与辅助微调基线相当的性能。实验结果表明,该方法在各种机器人任务中均表现出色。项目主页:https://chris1220313648.github.io/Fast-dVLA/