This paper presents a novel vision transformer (ViT) based deep joint source channel coding (DeepJSCC) scheme, dubbed DeepJSCC-l++, which can be adaptive to multiple target bandwidth ratios as well as different channel signal-to-noise ratios (SNRs) using a single model. To achieve this, we train the proposed DeepJSCC-l++ model with different bandwidth ratios and SNRs, which are fed to the model as side information. The reconstruction losses corresponding to different bandwidth ratios are calculated, and a new training methodology is proposed, which dynamically assigns different weights to the losses of different bandwidth ratios according to their individual reconstruction qualities. Shifted window (Swin) transformer, is adopted as the backbone for our DeepJSCC-l++ model. Through extensive simulations it is shown that the proposed DeepJSCC-l++ and successive refinement models can adapt to different bandwidth ratios and channel SNRs with marginal performance loss compared to the separately trained models. We also observe the proposed schemes can outperform the digital baseline, which concatenates the BPG compression with capacity-achieving channel code.
翻译:本文提出了一种基于视觉变换器(ViT)的深度联合信源信道编码(DeepJSCC)方案,命名为DeepJSCC-l++。该方案能够利用单一模型自适应多种目标带宽比及不同的信道信噪比(SNR)。为实现这一目标,我们使用不同带宽比和SNR训练所提出的DeepJSCC-l++模型,并将其作为边信息输入模型。通过计算不同带宽比对应的重建损失,本文提出了一种新的训练方法,该方法根据各带宽比的个体重建质量动态分配不同权重。模型采用移位窗口(Swin)变换器作为骨干网络。大量仿真结果表明,所提出的DeepJSCC-l++及其逐次精化模型能够在保持与单独训练模型性能损失极小的前提下,自适应不同带宽比和信道SNR。我们还观察到,所提方案优于将BPG压缩与容量可达信道编码串联的数字基线方案。