Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P$^{2}$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P$^{2}$-LLM can beat SOTA classical and learned codecs.

翻译：近来我们观察到，“智能”与“压缩”实为同一枚硬币的两面，其中具备空前智能的语言大模型（LLM）可作为一种面向多种数据模态的通用无损压缩器。这一特性对无损图像压缩领域尤其具有吸引力，因为在当前流媒体时代，对高分辨率图像的压缩需求日益增长。因此，一个自然的设想随之产生：LLM的压缩性能能否将无损图像压缩推向新的高度？然而，我们的研究发现，在常见基准数据集上，基于LLM的无损图像压缩器的简单应用与现有最先进（SOTA）编解码器相比存在显著的性能差距。鉴于此，我们致力于在无损图像压缩任务中充分释放LLM的空前智能（压缩）潜力，从而弥合理论与实际压缩性能之间的鸿沟。具体而言，我们提出了P$^{2}$-LLM，一种基于下一像素预测的LLM。该模型整合了多种精心设计的见解与方法，例如像素级先验、LLM的上下文能力以及像素级语义保持策略，以增强模型对像素序列的理解能力，从而实现更优的下一像素预测。在基准数据集上的大量实验表明，P$^{2}$-LLM能够超越SOTA的经典及学习型编解码器。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日