This paper presents an N-gram context-based Swin Transformer for learned image compression. Our method achieves variable-rate compression with a single model. By incorporating N-gram context into the Swin Transformer, we overcome its limitation of neglecting larger regions during high-resolution image reconstruction due to its restricted receptive field. This enhancement expands the regions considered for pixel restoration, thereby improving the quality of high-resolution reconstructions. Our method increases context awareness across neighboring windows, leading to a -5.86\% improvement in BD-Rate over existing variable-rate learned image compression techniques. Additionally, our model improves the quality of regions of interest (ROI) in images, making it particularly beneficial for object-focused applications in fields such as manufacturing and industrial vision systems.
翻译:本文提出了一种基于N-Gram上下文的Swin Transformer学习型图像压缩方法。该方法通过单一模型实现了可变码率压缩。通过将N-Gram上下文机制融入Swin Transformer,我们克服了其在受限感受野下进行高分辨率图像重建时忽略较大区域的局限性。这一增强扩展了像素重建时考虑的上下文区域,从而提升了高分辨率重建质量。我们的方法通过增强相邻窗口间的上下文感知能力,在BD-Rate指标上较现有可变码率学习型图像压缩技术提升了-5.86%。此外,该模型显著改善了图像中感兴趣区域(ROI)的重建质量,使其特别适用于制造业和工业视觉系统等以目标物体为核心的应用领域。