Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding

We present Nacrith, a lossless compression system that combines a 135M-parameter transformer language model (SmolLM2-135M) with an ensemble of lightweight online predictors and a 32-bit arithmetic coder, achieving the best compression results among the systems evaluated in this study on natural language text. Beyond the base LLM-plus-arithmetic-coding paradigm, Nacrith introduces several contributions: (1) a CDF precision upgrade from 2^16 to 2^24 that eliminates ~75% of quantization overhead caused by minimum-probability floors in large vocabularies; (2) a token-level N-gram model for fast local predictions; (3) an adaptive log-space bias head correcting per-document LLM errors via online gradient descent; (4) confidence-based LLM skip for accelerating highly predictable tokens; (5) a hybrid binary format (NC06) extending neural compression to arbitrary binary files--to our knowledge a first among LLM-based compressors; (6) a llama cpp inference backend achieving ~7x faster single-token decode than PyTorch; (7) parallel multi-GPU compression across up to 8 workers; and (8) native KV cache sliding window reducing per-slide cost by ~37x. The system requires only ~500 MB of GGUF weights and ~1.2 GB VRAM per worker, running on consumer GPUs. On alice29 (Canterbury Corpus, 152 KB), Nacrith achieves 0.918 bits per byte (bpb)--outperforming gzip by 3.1x, bzip2 by 2.5x, CMIX v21 by 44%, and ts_zip by 20%, while compressing below the 0th-, 1st-, and 2nd-order byte-level Shannon entropy bounds. On enwik8 (100 MB), Nacrith achieves 0.9389 bpb (11.74%), surpassing ts_zip (~1.11 bpb) by 15% and FineZip (1.024 bpb) by 8% despite using a 60x smaller model with no fine-tuning. An out-of-distribution (OOD) evaluation on a document published after the model's training cutoff confirms these gains are not memorization artifacts, achieving 0.723 bpb on unseen text.

翻译：本文提出Nacrith——一种结合135M参数Transformer语言模型（SmolLM2-135M）、轻量级在线预测器集成与32位算术编码器的无损压缩系统，在自然语言文本压缩评估中取得当前最优结果。在基础LLM加算术编码范式之外，Nacrith引入多项创新：（1）将CDF精度从2^16提升至2^24，消除大词表中因最小概率阈值导致的约75%量化开销；（2）面向快速局部预测的token级N-gram模型；（3）通过在线梯度下降校正单文档LLM误差的自适应对数空间偏置头；（4）基于置信度的LLM跳过机制以加速高可预测token处理；（5）支持任意二进制文件的混合二进制格式（NC06）——据我们所知，这是LLM基压缩器的首创；（6）基于llama.cpp推理后端实现比PyTorch快约7倍的单token解码；（7）支持最多8个并行工作器的多GPU压缩；（8）原生KV缓存滑动窗口将单滑动计算成本降低约37倍。该系统仅需约500MB GGUF权重文件，每个工作器约1.2GB显存，可在消费级GPU运行。在alice29数据集（坎特伯雷语料库，152KB）上，Nacrith达到0.918比特每字节（bpb）——性能超越gzip 3.1倍、bzip2 2.5倍、CMIX v21 44%、ts_zip 20%，且压缩率低于零阶、一阶及二阶字节级香农熵下界。在enwik8数据集（100MB）上，Nacrith达到0.9389 bpb（11.74%），以60倍更小模型且无需微调的条件超越ts_zip（约1.11 bpb）15%、FineZip（1.024 bpb）8%。在模型训练截止时间后发布的分布外文档评估中，系统对未见文本实现0.723 bpb，证实其性能增益非记忆伪影。