Large Language Models (LLMs) have changed the way natural language processing works, but it is still hard to store and manage prompts efficiently in production environments. This paper presents LoPace (Lossless Optimized Prompt Accurate Compression Engine), a novel compression framework designed specifically for prompt storage in LLM applications. LoPace uses three different ways to compress data: Zstandard-based compression, Byte-Pair Encoding (BPE) tokenization with binary packing, and a hybrid method that combines the two. We show that LoPace saves an average of 72.2\% of space while still allowing for 100\% lossless reconstruction by testing it on 386 different prompts, such as code snippets, markdown documentation, and structured content. The hybrid method always works better than each technique on its own. It gets mean compression ratios of 4.89x (range: 1.22--19.09x) and speeds of 3.3--10.7 MB/s. Our findings show that LoPace is ready for production, with a small memory footprint (0.35 MB on average) and great scalability for big databases and real-time LLM apps.
翻译:大语言模型(LLMs)改变了自然语言处理的工作方式,但在生产环境中高效存储和管理提示词仍面临挑战。本文提出LoPace(无损优化提示精准压缩引擎),一种专为LLM应用中的提示词存储设计的新型压缩框架。LoPace采用三种数据压缩方式:基于Zstandard的压缩、结合二进制打包的字节对编码(BPE)分词,以及融合两种技术的混合方法。通过对386个不同类型提示词(包括代码片段、Markdown文档和结构化内容)的测试,我们证明LoPace在保持100%无损重建的前提下,平均节省72.2%的存储空间。混合方法始终优于单一技术,平均压缩比达4.89倍(范围:1.22–19.09倍),处理速度为3.3–10.7 MB/s。实验结果表明,LoPace具备生产就绪能力,其内存占用极小(平均0.35 MB),并对大型数据库和实时LLM应用展现出卓越的可扩展性。