Good-Enough LLM Obfuscation (GELO)

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix $A$, forms $U = AH$, offloads $U$ and weights W to the accelerator, and then applies $A^{-1}$ on return, so that $A^{-1}((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyse information leakage and introduce two practical defences: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy "shield" vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.

翻译：大型语言模型（LLM）日益部署在共享加速器上运行，拥有设备内存读取权限的对手可观测KV缓存和隐藏状态，这对开源模型的提示词隐私构成威胁。诸如MPC和FHE等密码学保护方案虽能提供强安全保障，但其交互式推理速度仍慢一至两个数量级；而静态混淆方案在模型已知后，会在多次运行的统计攻击下失效。本文提出GELO（足够好的大语言模型混淆方案），一种轻量级隐私保护推理协议，通过采用每批次新鲜可逆混合矩阵对隐藏状态进行混淆，以限制不可信加速器观测导致的信息泄露。针对每个卸载的投影计算，可信执行环境（TEE）采样随机矩阵$A$，构造$U = AH$，将$U$和权重W卸载至加速器，随后在返回时应用$A^{-1}$，使得$A^{-1}((AH)W ) = HW$，从而保持输出不变。由于混合矩阵在批次间永不重复使用，攻击者仅面临单批次盲源分离问题。我们系统分析了信息泄露机制，并提出两种实用防御策略：（一）采用非正交混合以掩盖Gram矩阵；（二）在正交混合基础上注入少量高能量“屏蔽”向量以污染高阶统计量。在Llama-2 7B模型上的实验表明，GELO能精确保持float32输出结果，与低精度基线性能高度吻合，在卸载主导性矩阵乘法运算时仅产生约20-30%的延迟开销，并能有效抵御一系列ICA/BSS及基于锚点的攻击。