OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data

Lossless compression is essential for efficient data storage and transmission. Although learning-based lossless compressors achieve strong results, most of them are designed for a single modality, leading to redundant compressor deployments in multi-modal settings. Designing a unified multi-modal compressor is critical yet challenging, as different data types vary largely in format, dimension, and statistics. Multi-modal large language models offer a promising resolution but remain too complex for practical use. Thus, we propose \textbf{OmniZip}, \textbf{a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence)}. Built on a lightweight backbone, OmniZip incorporates three key components to enable efficient multi-modal lossless compression: a modality-unified tokenizer that reversibly transforms diverse data into tokens, a modality-routing context learning mechanism that enables flexible multi-modal context modeling, and a modality-routing feedforward design that further enhances the model's nonlinear representation flexibility. A reparameterization training strategy is used to enhance model capacity. OmniZip outperforms or matches other state-of-the-art compressors on multiple modalities, achieving 42\%, 57\%, 62\% and 42\%, 53\% higher compression efficiency than gzip on CLIC-M, TouchandGo, enwik9, LibriSpeech, and WikiSQL datasets, respectively. It also supports near real-time inference on resource-constrained edge devices, reaching about 1MB/s on MacBook CPUs and iPhone NPUs. Our code is released at https://github.com/adminasmi/OmniZip-CVPR2026.

翻译：无损压缩对于高效的数据存储与传输至关重要。尽管基于学习的无损压缩器取得了优异的性能，但其中大多数仅针对单一模态设计，导致在多模态场景中需部署多个冗余的压缩器。设计一个统一的多模态压缩器至关重要，但也极具挑战性，因为不同数据类型在格式、维度和统计特性上差异巨大。多模态大语言模型提供了一种有前景的解决方案，但其复杂性仍难以满足实际应用需求。因此，我们提出了 \textbf{OmniZip}，\textbf{一种面向多模态数据（如图像、文本、语音、触觉、数据库和基因序列）的统一轻量级无损压缩器}。OmniZip 构建于一个轻量级骨干网络之上，融合了三个关键组件以实现高效的多模态无损压缩：一个模态统一的标记器，可将多样化的数据可逆地转换为标记；一个模态路由上下文学习机制，支持灵活的多模态上下文建模；以及一个模态路由前馈设计，进一步增强了模型的非线性表示灵活性。我们采用重参数化训练策略以提升模型容量。在多种模态数据上，OmniZip 均优于或匹配其他最先进的压缩器，在 CLIC-M、TouchandGo、enwik9、LibriSpeech 和 WikiSQL 数据集上，其压缩效率分别比 gzip 高出 42%、57%、62% 和 42%、53%。同时，它支持在资源受限的边缘设备上进行近实时推理，在 MacBook CPU 和 iPhone NPU 上可达到约 1MB/s 的速度。我们的代码发布于 https://github.com/adminasmi/OmniZip-CVPR2026。