Large Language Models (LLMs) possess a theoretical capability to model information density far beyond the limits of classical statistical methods (e.g., Lempel-Ziv). However, utilizing this capability for lossless compression involves navigating severe system constraints, including non-deterministic hardware and prohibitive computational costs. In this work, we present an exploratory study into the feasibility of LLM-based archival systems. We introduce \textbf{Hybrid-LLM}, a proof-of-concept architecture designed to investigate the "entropic capacity" of foundation models in a storage context. \textbf{We identify a critical barrier to deployment:} the "GPU Butterfly Effect," where microscopic hardware non-determinism precludes data recovery. We resolve this via a novel logit quantization protocol, enabling the rigorous measurement of neural compression rates on real-world data. Our experiments reveal a distinct divergence between "retrieval-based" density (0.39 BPC on memorized literature) and "predictive" density (0.75 BPC on unseen news). While current inference latency ($\approx 2600\times$ slower than Zstd) limits immediate deployment to ultra-cold storage, our findings demonstrate that LLMs successfully capture semantic redundancy inaccessible to classical algorithms, establishing a baseline for future research into semantic file systems.
翻译:大型语言模型(LLMs)在信息密度建模方面具备远超经典统计方法(如Lempel-Ziv)理论极限的能力。然而,利用这一能力实现无损压缩需应对严峻的系统约束,包括非确定性硬件与高昂的计算成本。本文针对基于LLM的归档系统的可行性开展探索性研究。我们提出 **Hybrid-LLM** 原型架构,旨在探究基础模型在存储场景中的“熵容量”。**我们识别出一个关键部署障碍:** 即“GPU蝴蝶效应”——微观硬件非确定性导致数据无法恢复。通过一种新颖的对数量化协议,我们解决了该问题,并实现了对真实世界数据神经压缩率的严格测量。实验表明,“基于检索”的密度(对记忆文本为0.39 BPC)与“基于预测”的密度(对未见新闻为0.75 BPC)存在显著差异。尽管当前推理延迟(约为Zstd的2600倍)限制其仅能应用于超冷存储场景,但我们的发现证明LLMs能够成功捕捉经典算法无法触及的语义冗余,为未来语义文件系统的研究建立基准。