Traditional methods, such as JPEG, perform image compression by operating on structural information, such as pixel values or frequency content. These methods are effective to bitrates around one bit per pixel (bpp) and higher at standard image sizes. In contrast, text-based semantic compression directly stores concepts and their relationships using natural language, which has evolved with humans to efficiently represent these salient concepts. These methods can operate at extremely low bitrates by disregarding structural information like location, size, and orientation. In this work, we use GPT-4V and DALL-E3 from OpenAI to explore the quality-compression frontier for image compression and identify the limitations of current technology. We push semantic compression as low as 100 $\mu$bpp (up to $10,000\times$ smaller than JPEG) by introducing an iterative reflection process to improve the decoded image. We further hypothesize this 100 $\mu$bpp level represents a soft limit on semantic compression at standard image resolutions.
翻译:传统方法(如JPEG)通过操作结构信息(如像素值或频率内容)进行图像压缩。这些方法在标准图像尺寸下,对约1比特每像素(bpp)及更高比特率有效。相比之下,基于文本的语义压缩直接使用自然语言存储概念及其关系,而自然语言随人类进化高效表征这些显著概念。此类方法能忽略位置、尺寸和朝向等结构信息,在极低比特率下运行。本工作中,我们使用OpenAI的GPT-4V和DALL-E3探索图像压缩的质量-压缩比前沿,并识别当前技术的局限性。通过引入迭代反思过程改进解码图像,我们将语义压缩推低至100微比特每像素(µbpp)(比JPEG小10,000倍以上)。我们进一步假设,在标准图像分辨率下,100 µbpp水平代表了语义压缩的一个软极限。