Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale residual blocks (CTMSRB) for improved multi-scale learning and nonlinearity in SC. Additionally, we employ a window-based attention module (WAM) to capture pixel correlations, especially for high contrast regions in the image. We also construct a diverse SC image compression dataset (SDU-SCICD2K) for training, including text, charts, graphics, animation, movie, game and mixture of SC images and NS images. Experimental results show our method, more suited for SC than NS data, outperforms existing LIC methods in rate-distortion performance on SC images. The code is publicly available at https://github.com/SunshineSki/OMR Net.git.
翻译:屏幕内容(SC)图像与自然场景(NS)图像不同,具有无噪声、重复图案和高对比度等独特特征。针对当前基于学习的图像压缩(LIC)方法在处理SC图像时的不足,我们提出了一种改进的两阶段八度卷积残差块(IToRB),用于提取高频和低频特征;以及一种级联的两阶段多尺度残差块(CTMSRB),以增强SC图像中的多尺度学习和非线性表达能力。此外,我们采用了一种基于窗口的注意力模块(WAM)来捕捉像素相关性,特别是针对图像中的高对比度区域。我们还构建了一个多样化的SC图像压缩数据集(SDU-SCICD2K)用于训练,该数据集包含文本、图表、图形、动画、电影、游戏以及SC与NS的混合图像。实验结果表明,我们的方法比现有LIC方法在SC图像上取得了更优的率失真性能,且更适用于SC数据而非NS数据。代码已公开于 https://github.com/SunshineSki/OMR Net.git。