Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities enabling prompt leakage attacks. Prior studies identified these attack surfaces yet focused on expanding attack vectors rather than optimizing attack performance, reporting impractically high attack costs that underestimate the true privacy risk. We propose OptiLeak, a reinforcement learning-enhanced framework that maximizes prompt reconstruction efficiency through two-stage fine-tuning. Our key insight is that domain-specific ``hard tokens'' -- terms difficult to predict yet carrying sensitive information -- can be automatically identified via likelihood ranking and used to construct preference pairs for Direct Preference Optimization, eliminating manual annotation. This enables effective preference alignment while avoiding the overfitting issues of extended supervised fine-tuning. Evaluated on three benchmarks spanning medical and financial domains, OptiLeak achieves up to $12.48\times$ reduction in average requests per token compared to baseline approaches, with consistent improvements across model scales from 3B to 14B parameters. Our findings demonstrate that cache-based prompt leakage poses a more severe threat than previously reported, underscoring the need for robust cache isolation in production deployments.
翻译:多租户LLM服务框架广泛采用共享键值缓存以提升效率,然而这引入了可导致提示词泄露攻击的侧信道漏洞。先前研究虽识别了此类攻击面,但主要聚焦于扩展攻击向量而非优化攻击性能,所报告的攻击成本过高且不切实际,从而低估了真实的隐私风险。本文提出OptiLeak——一种通过两阶段微调实现提示词重构效率最大化的强化学习增强框架。我们的核心洞见在于:领域特定的“困难词元”(即难以预测但携带敏感信息的术语)可通过似然排序自动识别,并用于构建直接偏好优化的偏好对,从而无需人工标注。该方法在实现有效偏好对齐的同时,避免了扩展监督微调导致的过拟合问题。在涵盖医疗与金融领域的三个基准测试中,OptiLeak相较于基线方法实现了平均每词元请求数最高达$12.48\times$的降低,且在3B至14B参数规模的不同模型上均保持一致的性能提升。我们的研究结果表明,基于缓存的提示词泄露威胁比既往报道更为严重,这凸显了在生产部署中实施强健缓存隔离机制的必要性。