Efficient Semantic Image Communication for Traffic Monitoring at the Edge

Many visual monitoring systems operate under strict communication constraints, where transmitting full-resolution images is impractical and often unnecessary. In such settings, visual data is often used for object presence, spatial relationships, and scene context rather than exact pixel fidelity. This paper presents two semantic image communication pipelines for traffic monitoring, MMSD and SAMR, that reduce transmission cost while preserving meaningful visual information. MMSD (Multi-Modal Semantic Decomposition) targets very high compression together with data confidentiality, since sensitive pixel content is not transmitted. It replaces the original image with compact semantic representations, namely segmentation maps, edge maps, and textual descriptions, and reconstructs the scene at the receiver using a diffusion-based generative model. SAMR (Semantic-Aware Masking Reconstruction) targets higher visual quality while maintaining strong compression. It selectively suppresses non-critical image regions according to semantic importance before standard JPEG encoding and restores the missing content at the receiver through generative inpainting. Both designs follow an asymmetric sender-receiver architecture, where lightweight processing is performed at the edge and computationally intensive reconstruction is offloaded to the server. On a Raspberry Pi~5, the edge-side processing time is about 15s for MMSD and 9s for SAMR. Experimental results show average transmitted-data reductions of 99% for MMSD and 99.1% for SAMR. In addition, MMSD achieves lower payload size than the recent SPIC baseline while preserving strong semantic consistency, whereas SAMR provides a better quality-compression trade-off than standard JPEG and SQ-GAN under comparable operating conditions.

翻译：许多视觉监控系统在严格的通信约束下运行，传输全分辨率图像既不具实用性，也往往无此必要。在此类场景中，视觉数据常被用于目标存在性、空间关系和场景上下文分析，而非追求精确的像素保真度。本文针对交通监控提出两种语义图像通信流水线——MMSD与SAMR——在减少传输开销的同时保留有意义的视觉信息。MMSD（多模态语义分解）旨在实现超高压缩率与数据机密性，因敏感像素内容不予传输。该方法以紧凑的语义表征（即分割图、边缘图和文本描述）替代原始图像，并通过基于扩散的生成模型在接收端重建场景。SAMR（语义感知掩码重建）在维持高压缩率的同时追求更优的视觉质量。该方法根据语义重要性选择性抑制非关键图像区域，再经标准JPEG编码后，通过生成式修复在接收端恢复缺失内容。两种设计均采用非对称的发送端-接收端架构：边缘端执行轻量级处理，而计算密集型的重建任务则卸载至服务器。在Raspberry Pi~5上，边缘端处理时间对MMSD约为15秒，对SAMR约为9秒。实验结果表明，MMSD与SAMR的平均传输数据压缩比分别达99%与99.1%。此外，MMSD在保持强语义一致性的前提下，实现了比近期SPIC基线更低的负载大小；而SAMR在可比运行条件下，相较于标准JPEG与SQ-GAN展现出更优的质量-压缩权衡。