Interactive computer vision (CV) plays a crucial role in various real-world applications, whose performance is highly dependent on communication networks. Nonetheless, the data-oriented characteristics of conventional communications often do not align with the special needs of interactive CV tasks. To alleviate this issue, the recently emerged semantic communications only transmit task-related semantic information and exhibit a promising landscape to address this problem. However, the communication challenges associated with Semantic Facial Editing, one of the most important interactive CV applications on social media, still remain largely unexplored. In this paper, we fill this gap by proposing Editable-DeepSC, a novel cross-modal semantic communication approach for facial editing. Firstly, we theoretically discuss different transmission schemes that separately handle communications and editings, and emphasize the necessity of Joint Editing-Channel Coding (JECC) via iterative attributes matching, which integrates editings into the communication chain to preserve more semantic mutual information. To compactly represent the high-dimensional data, we leverage inversion methods via pre-trained StyleGAN priors for semantic coding. To tackle the dynamic channel noise conditions, we propose SNR-aware channel coding via model fine-tuning. Extensive experiments indicate that Editable-DeepSC can achieve superior editings while significantly saving the transmission bandwidth, even under high-resolution and out-of-distribution (OOD) settings.
翻译:交互式计算机视觉在众多现实应用中发挥着关键作用,其性能高度依赖于通信网络。然而,传统通信以数据为导向的特性往往无法满足交互式计算机视觉任务的特殊需求。为解决这一问题,近年来兴起的语义通信仅传输与任务相关的语义信息,展现出应对该挑战的广阔前景。然而,作为社交媒体上最重要的交互式计算机视觉应用之一,语义人脸编辑所涉及的通信挑战仍未得到充分探索。本文通过提出Editable-DeepSC——一种面向面部编辑的新型跨模态语义通信方法,填补了这一空白。首先,我们从理论上讨论了分别处理通信和编辑任务的不同传输方案,并强调了通过迭代属性匹配实现联合编辑-信道编码的必要性——该方案将编辑过程集成到通信链路中,以保留更多的语义互信息。为紧凑表示高维数据,我们利用基于预训练StyleGAN先验的逆映射方法进行语义编码。为应对动态信道噪声条件,我们提出通过模型微调实现信噪比感知的信道编码。大量实验表明,即使在超高分辨率及分布外场景下,Editable-DeepSC仍能实现卓越的编辑效果,同时显著节省传输带宽。