Power-Softmax: Towards Secure LLM Inference over Encrypted Data

Modern cryptographic methods for implementing privacy-preserving LLMs such as \gls{HE} require the LLMs to have a polynomial form. Forming such a representation is challenging because transformers include non-polynomial components, such as \Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., \Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs over a billion parameters, exceeding the size of previous models by more than tenfold. The resulting models demonstrate reasoning and in-context learning (ICL) capabilities comparable to standard transformers of the same size, representing a breakthrough in the field. Finally, we provide a detailed latency breakdown for each computation over encrypted data, paving the way for further optimization, and explore the differences in inductive bias between models relying on our HE-friendly variant and standard transformers.

翻译：实现隐私保护大语言模型的现代密码学方法（如同态加密）要求模型具有多项式形式。由于Transformer包含Softmax和层归一化等非多项式组件，构建这种表示形式极具挑战性。此前方法要么直接采用高阶多项式逼近预训练模型（导致同态加密效率低下），要么在训练前将非多项式组件替换为更易逼近的基元（如用点积注意力替代Softmax）。后者可能引发可扩展性问题。本文提出一种新型同态加密友好的自注意力变体，该变体具有稳定的训练形式，且易于通过多项式逼近实现安全推理。我们首次实现了参数量超十亿的多项式大语言模型，规模较先前模型提升逾十倍。所得模型展现出与同规模标准Transformer相当的推理与上下文学习能力，标志着该领域的重大突破。最后，我们详细剖析了加密数据上各项计算的延迟分布，为后续优化奠定基础，并探讨了依赖同态加密友好变体的模型与标准Transformer在归纳偏置上的差异。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日