Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning

Semantic communication is emerging as a key enabler for distributed edge intelligence due to its capability to convey task-relevant meaning. However, achieving communication-efficient training and robust inference over wireless links remains challenging. This challenge is further exacerbated for multi-modal edge inference (MMEI) by two factors: 1) prohibitive communication overhead for distributed learning over bandwidth-limited wireless links, due to the \emph{multi-modal} nature of the system; and 2) limited robustness under varying channels and noisy multi-modal inputs. In this paper, we propose a three-stage communication-aware distributed learning framework to improve training and inference efficiency while maintaining robustness over wireless channels. In Stage~I, devices perform local multi-modal self-supervised learning to obtain shared and modality-specific encoders without device--server exchange, thereby reducing the communication cost. In Stage~II, distributed fine-tuning with centralized evidential fusion calibrates per-modality uncertainty and reliably aggregates features distorted by noise or channel fading. In Stage~III, an uncertainty-guided feedback mechanism selectively requests additional features for uncertain samples, optimizing the communication--accuracy tradeoff in the distributed setting. Experiments on RGB--depth indoor scene classification show that the proposed framework attains higher accuracy with far fewer training communication rounds and remains robust to modality degradation or channel variation, outperforming existing self-supervised and fully supervised baselines.

翻译：语义通信因其传递任务相关语义的能力，正成为分布式边缘智能的关键使能技术。然而，在无线链路上实现通信高效的训练与鲁棒推理仍具挑战性。对于多模态边缘推理（MMEI），这一挑战因以下两个因素进一步加剧：1）由于系统的\emph{多模态}特性，在带宽受限的无线链路上进行分布式学习会产生过高的通信开销；2）在变化的信道和带噪多模态输入下，系统的鲁棒性有限。本文提出一种三阶段通信感知的分布式学习框架，旨在提升训练与推理效率，同时保持无线信道下的鲁棒性。在阶段I中，设备执行本地多模态自监督学习，以获取共享及模态特定的编码器，无需设备与服务器间的数据交换，从而降低通信成本。在阶段II中，采用集中式证据融合的分布式微调校准各模态的不确定性，并可靠地聚合受噪声或信道衰落干扰的特征。在阶段III中，一种不确定性引导的反馈机制选择性地为不确定样本请求额外特征，以优化分布式设置下的通信-精度权衡。在RGB-深度室内场景分类任务上的实验表明，所提框架能以远少于现有方法的训练通信轮数达到更高精度，并对模态退化或信道变化保持鲁棒性，优于现有的自监督与全监督基线方法。