Large language models (LLMs) have recently seen widespread adoption, in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners. Moreover, the high cost of cloud-based deployment has driven interest towards deployment to edge devices, yet this risks exposing valuable parameters to theft and unauthorized use. Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements. In this paper, we introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft. SLIP is the first hybrid protocol that is both practical for real-world applications and provably secure, while having zero accuracy degradation and minimal impact on latency. It involves partitioning the model between two computing resources, one secure but expensive, and another cost-effective but vulnerable. This is achieved through matrix decomposition, ensuring that the secure resource retains a maximally sensitive portion of the model's IP while performing a minimal amount of computations, and vice versa for the vulnerable resource. Importantly, the protocol includes security guarantees that prevent attackers from exploiting the partition to infer the secured information. Finally, we present experimental results that show the robustness and effectiveness of our method, positioning it as a compelling solution for protecting LLMs.
翻译:近年来,大型语言模型(LLMs)在学术界和工业界得到广泛应用。随着模型规模的增长,它们已成为体现所有者巨额投资的重要知识产权(IP)。然而,基于云的部署成本高昂,促使业界转向边缘设备部署,但这可能导致有价值的模型参数面临被盗和未经授权使用的风险。当前保护边缘设备模型知识产权的方法在实用性、精度损失或需求适配性方面存在局限。本文提出了一种名为SLIP的新型混合推理算法,旨在保护边缘部署模型免遭窃取。SLIP是首个兼具实际应用可行性与可证明安全性、同时保持零精度损失且对延迟影响极小的混合协议。该协议通过矩阵分解将模型分割至两种计算资源之间:一种安全但成本高昂,另一种经济实惠但易受攻击。这种分割确保安全资源在保留模型知识产权最敏感部分的同时仅执行最少计算量,而易受攻击资源则执行相反操作。值得注意的是,该协议包含安全保证机制,可防止攻击者利用分割结构推断受保护信息。最后,我们通过实验验证了该方法的鲁棒性与有效性,证明其是保护大型语言模型知识产权的创新解决方案。