Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can counteract such shifts, existing methods are too resource-intensive for on-device deployment. To address this challenge, we propose LQA, a lightweight, quantized-adaptive framework for VLMs that combines a modality-aware quantization strategy with gradient-free test-time adaptation. We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware. Experiments across both synthetic and real-world distribution shifts show that LQA improves overall adaptation performance by 4.5\%, uses less memory than full-precision models, and significantly outperforms gradient-based TTA methods, achieving up to 19.9$\times$ lower memory usage across seven open-source datasets. These results demonstrate that LQA offers a practical pathway for robust, privacy-preserving, and efficient VLM deployment on edge devices.
翻译:在边缘设备上部署视觉-语言模型面临资源受限与分布偏移下性能下降的挑战。虽然测试时自适应方法能够缓解此类偏移,但现有方法资源消耗过高,难以在设备端部署。为应对这一挑战,本文提出LQA——一种轻量化的量化自适应框架,该框架将模态感知量化策略与无梯度测试时自适应相结合。我们提出了选择性混合量化方法以及一种量化的无梯度自适应机制,从而在资源受限的硬件上实现鲁棒且高效的视觉-语言模型部署。在合成数据与真实场景分布偏移下的实验表明:LQA将整体自适应性能提升4.5%,内存占用低于全精度模型,并显著优于基于梯度的测试时自适应方法——在七个开源数据集上实现了最高达19.9倍的内存使用降低。这些结果证明,LQA为在边缘设备上实现鲁棒、隐私保护且高效的视觉-语言模型部署提供了可行路径。