STCC: A Unified Source-Channel Semantic Token Coding Framework for Semantic Communications

Deep Joint Source-Channel Coding (JSCC) has emerged as a promising paradigm for overcoming the ``cliff effect" in wireless communications. However, existing Deep JSCC frameworks operate directly on raw analog data such as image pixels rather than the discrete semantic tokens that foundation models require. Moreover, traditional systems employ fixed, hand-designed constellations that treat all tokens equally, leading to catastrophic random errors under channel noise. In this paper, the Semantic Token Codebook Communication (STCC) is proposed as a unified source-channel semantic token coding framework designed to transmit the discrete semantic tokens of foundation models over noisy channels. The core of STCC is the Semantic Token Codec (STC). It accepts discrete tokens as input, which maintains compatibility with foundation models while employing a residual multiple layer perceptron, i.e., MLP-based encoder that learns geometrically structured constellations optimized with a triple-loss objective. This learned mapping forces the channel topology to align with the semantic embedding space, ensuring that channel noise results in topological errors rather than random corruption. This phenomenon is theoretically and empirically characterized, identifying ``Semantic Drift" in symbolic modalities and ``Structural Distortion" in perceptual modalities, where errors shift predictions to semantically or structurally similar tokens. Extensive experiments demonstrate that STCC significantly outperforms traditional systems in low-SNR regimes, effectively converting channel noise into semantic variations without requiring receiver-side modification.

翻译：深度联合源-信道编码（JSCC）已成为克服无线通信中"悬崖效应"的范式方法。然而，现有深度JSCC框架直接处理原始模拟数据（如图像像素），而非基础模型所需的离散语义令牌。此外，传统系统采用固定且人工设计的星座图，对所有令牌一视同仁，导致信道噪声下出现灾难性随机错误。本文提出语义令牌码本通信（STCC）作为一种统一的源-信道语义令牌编码框架，旨在通过噪声信道传输基础模型的离散语义令牌。STCC的核心是语义令牌编解码器（STC）。它接受离散令牌作为输入，以保持与基础模型的兼容性，同时采用基于残差多层感知机（即MLP）的编码器，学习由三元损失目标优化的几何结构星座图。这一学习映射迫使信道拓扑与语义嵌入空间对齐，从而确保信道噪声引发拓扑错误而非随机损坏。本文从理论和实证两方面刻画了这一现象，识别出符号模态中的"语义漂移"和感知模态中的"结构扭曲"——其中错误将预测偏移至语义或结构相似的令牌。大量实验表明，STCC在低信噪比条件下显著优于传统系统，能在无需修改接收端的情况下有效将信道噪声转化为语义变异。