This paper presents a novel vision transformer (ViT) based deep joint source channel coding (DeepJSCC) scheme, dubbed DeepJSCC-l++, which can be adaptive to multiple target bandwidth ratios as well as different channel signal-to-noise ratios (SNRs) using a single model. To achieve this, we train the proposed DeepJSCC-l++ model with different bandwidth ratios and SNRs, which are fed to the model as side information. The reconstruction losses corresponding to different bandwidth ratios are calculated, and a new training methodology is proposed, which dynamically assigns different weights to the losses of different bandwidth ratios according to their individual reconstruction qualities. Shifted window (Swin) transformer, is adopted as the backbone for our DeepJSCC-l++ model. Through extensive simulations it is shown that the proposed DeepJSCC-l++ and successive refinement models can adapt to different bandwidth ratios and channel SNRs with marginal performance loss compared to the separately trained models. We also observe the proposed schemes can outperform the digital baseline, which concatenates the BPG compression with capacity-achieving channel code.
翻译:本文提出了一种基于视觉Transformer(ViT)的深度联合信源信道编码(DeepJSCC)方案,称为DeepJSCC-l++,该方案能够使用单一模型自适应匹配多种目标带宽比以及不同信道信噪比(SNR)。为实现这一目标,我们采用不同的带宽比和SNR训练所提出的DeepJSCC-l++模型,并将这些参数作为侧信息输入模型。模型计算不同带宽比对应的重建损失,并提出一种新的训练方法,根据各带宽比的重建质量动态分配不同权重。采用移位窗口(Swin)Transformer作为DeepJSCC-l++模型的骨干网络。大量仿真表明,与分别训练的模型相比,所提出的DeepJSCC-l++及渐进细化模型在适应不同带宽比和信道SNR时仅产生轻微性能损失。我们还观察到,所提方案能够优于结合BPG压缩与容量逼近信道编码的数字基线方案。