The learned image compression (LIC) methods have already surpassed traditional techniques in compressing natural scene (NS) images. However, directly applying these methods to screen content (SC) images, which possess distinct characteristics such as sharp edges, repetitive patterns, embedded text and graphics, yields suboptimal results. This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. To overcome these challenges, we propose a novel compression method that employs a multi-frequency two-stage octave residual block (MToRB) for feature extraction, a cascaded triple-scale feature fusion residual block (CTSFRB) for multi-scale feature integration and a multi-frequency context interaction module (MFCIM) to reduce inter-frequency correlations. Additionally, we introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. Furthermore, we construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms. Experimental results demonstrate that our approach significantly improves SC image compression performance, outperforming traditional standards and state-of-the-art learning-based methods in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM).
翻译:学习式图像压缩方法在自然场景图像压缩方面已超越传统技术。然而,直接将这些方法应用于具有锐利边缘、重复图案、嵌入式文本与图形等独特特征的屏幕内容图像时,效果欠佳。本文针对屏幕内容图像压缩中的三个关键挑战展开研究:紧凑潜在特征的学习、量化步长的自适应调节以及大规模屏幕内容数据集的缺乏。为解决这些挑战,我们提出了一种新颖的压缩方法,该方法采用多频率两阶段八度残差块进行特征提取,通过级联三尺度特征融合残差块实现多尺度特征整合,并利用多频率上下文交互模块降低频率间相关性。此外,我们引入了自适应量化模块,该模块可为各频率分量学习缩放均匀噪声,从而实现对量化粒度的灵活控制。同时,我们构建了大规模屏幕内容图像压缩数据集,其中包含超过10,000张涵盖基础屏幕内容图像、计算机渲染图像以及来自PC与移动平台的混合自然场景与屏幕内容图像。实验结果表明,我们的方法显著提升了屏幕内容图像压缩性能,在峰值信噪比与多尺度结构相似性指标上均优于传统标准与当前最先进的学习式方法。