Deploying Vision-Language Models (VLMs) on edge devices remains challenging due to their substantial computational and memory demands, which exceed the capabilities of resource-constrained embedded platforms. Conversely, fully offloading inference to the cloud is often impractical in bandwidth-limited environments, where transmitting raw visual data introduces substantial latency overhead. While recent edge-cloud collaborative architectures attempt to partition VLM workloads across devices, they typically rely on transmitting fixed-size representations, lacking adaptability to dynamic network conditions and failing to fully exploit semantic redundancy. In this paper, we propose a progressive semantic communication framework for edge-cloud VLM inference, using a Meta AutoEncoder that compresses visual tokens into adaptive, progressively refinable representations, enabling plug-and-play deployment with off-the-shelf VLMs without additional fine-tuning. This design allows flexible transmission at different information levels, providing a controllable trade-off between communication cost and semantic fidelity. We implement a full end-to-end edge-cloud system comprising an embedded NXP i.MX95 platform and a GPU server, communicating over bandwidth-constrained networks. Experimental results show that, at 1 Mbps uplink, the proposed progressive scheme significantly reduces network latency compared to full-edge and full-cloud solutions, while maintaining high semantic consistency even under high compression. The implementation code will be released upon publication at https://github.com/open-ep/ProSemComVLM.
翻译:在资源受限的边缘设备上部署视觉语言模型(VLM)仍面临挑战,因其计算与内存需求远超嵌入式平台承载能力。另一方面,在带宽受限环境中将推理任务完全卸载至云端通常不可行,因为传输原始视觉数据会引入显著延迟开销。尽管近期边缘-云协同架构尝试跨设备分割VLM工作负载,但此类方案通常依赖传输固定尺寸表征,既缺乏对动态网络条件的适应性,亦未能充分挖掘语义冗余潜力。本文提出面向边缘-云VLM推理的渐进式语义通信框架,通过元自编码器将视觉令牌压缩为可自适应、渐进精化的表征,支持与现成VLM即插即用部署而无需额外微调。该设计支持在不同信息层级下灵活传输,实现通信成本与语义保真度间的可控权衡。我们构建了包含嵌入式NXP i.MX95平台和GPU服务器的全端到端边缘-云系统,在带宽受限网络环境中进行通信。实验结果表明:在1 Mbps上行速率下,相较于纯边缘与纯云端方案,所提渐进式方案显著降低网络延迟,即便在高压缩条件下仍能保持高语义一致性。实现代码将在论文发表后发布于https://github.com/open-ep/ProSemComVLM。