Vision-Language-Action models (VLAs) have demonstrated strong potential for embodied AI, yet their deployment on resource-limited robots remains challenging due to high memory and computational demands. While Post-Training Quantization (PTQ) provides an efficient solution, directly applying PTQ to VLAs often results in severe performance degradation during sequential control. We identify temporal error accumulation as a key factor, where quantization perturbations at the vision-language-to-action interface are progressively amplified, leading to kinematic drift in executed trajectories. To address this issue, we propose Drift-Aware Post-Training Quantization (DA-PTQ), which formulates quantization as a drift-aware optimization problem over sequential decision processes. DA-PTQ consists of two components: (1) Cross-Space Representation Compensation, which mitigates structured distortions between multimodal representations and action space to improve action consistency, and (2) Motion-Driven Mixed-Precision Allocation, which assigns bit-widths by minimizing trajectory-level motion errors. Extensive experiments show that DA-PTQ significantly reduces kinematic drift and achieves comparable performance to full-precision models under low-bit settings, enabling practical deployment of VLAs on resource-limited robotic platforms.
翻译:视觉-语言-动作模型在具身智能领域展现出巨大潜力,但由于高内存和计算需求,其在资源受限机器人上的部署仍具挑战。尽管训练后量化提供了高效解决方案,但直接将其应用于视觉-语言-动作模型常导致序列控制中的严重性能下降。我们识别出时序误差累积是关键因素:视觉-语言到动作接口处的量化扰动会逐步放大,导致执行轨迹的运动学漂移。为解决该问题,我们提出漂移感知训练后量化,将量化建模为序贯决策过程中的漂移感知优化问题。DA-PTQ包含两个组件:(1)跨空间表示补偿,通过缓解多模态表示与动作空间之间的结构失配提升动作一致性;(2)运动驱动混合精度分配,通过最小化轨迹级运动误差分配位宽。大量实验表明,DA-PTQ显著减少运动学漂移,在低位宽设置下达到与全精度模型相当的性能,从而能够在资源受限机器人平台上实际部署视觉-语言-动作模型。