Good-Enough LLM Obfuscation (GELO)

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix A, forms $U = AH$, offloads U and weights W to the accelerator, and then applies $A^-1$ on return, so that $A^-1 ((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyze information leakage and introduce two practical defenses: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy "shield" vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.

翻译：大型语言模型（LLM）越来越多地部署在共享加速器上，攻击者通过读取设备内存可观察到键值缓存和隐藏状态，这对开源模型的提示隐私构成威胁。MPC和FHE等密码学保护方案虽然能提供强安全保障，但其交互式推理速度仍慢一至两个数量级；而静态混淆方案在模型已知时，会在多次运行的统计攻击下失效。本文提出GELO（足够好的大语言模型混淆方法），这是一种轻量级的隐私保护推理协议，通过使用每批次生成的可逆混合矩阵对隐藏状态进行混淆，从而限制不可信加速器观测导致的信息泄露。对于每个卸载的投影计算，可信执行环境（TEE）采样随机矩阵A，构造$U = AH$，将U和权重W卸载至加速器，返回时应用$A^-1$，使得$A^-1 ((AH)W ) = HW$，从而保持输出不变。由于混合矩阵在不同批次间绝不重复使用，攻击者仅面临单批次盲源分离问题。我们分析了信息泄露情况，并引入两种实用防御机制：（i）使用非正交混合矩阵以掩盖格拉姆矩阵；（ii）采用正交混合矩阵并添加少量高能量“屏蔽”向量以污染高阶统计量。在Llama-2 7B模型上的实验表明，GELO能精确保持float32输出结果，与低精度基线性能高度吻合，在卸载主要矩阵乘法运算时仅产生约20-30%的延迟开销，并能有效抵御一系列基于ICA/BSS和锚点的攻击。