Modern cryptographic methods for implementing privacy-preserving LLMs such as \gls{HE} require the LLMs to have a polynomial form. Forming such a representation is challenging because transformers include non-polynomial components, such as \Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., \Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs over a billion parameters, exceeding the size of previous models by more than tenfold. The resulting models demonstrate reasoning and in-context learning (ICL) capabilities comparable to standard transformers of the same size, representing a breakthrough in the field. Finally, we provide a detailed latency breakdown for each computation over encrypted data, paving the way for further optimization, and explore the differences in inductive bias between models relying on our HE-friendly variant and standard transformers.
翻译:实现隐私保护大语言模型的现代密码学方法(如同态加密)要求模型具有多项式形式。由于Transformer包含Softmax和层归一化等非多项式组件,构建这种表示形式极具挑战性。此前方法要么直接采用高阶多项式逼近预训练模型(导致同态加密效率低下),要么在训练前将非多项式组件替换为更易逼近的基元(如用点积注意力替代Softmax)。后者可能引发可扩展性问题。本文提出一种新型同态加密友好的自注意力变体,该变体具有稳定的训练形式,且易于通过多项式逼近实现安全推理。我们首次实现了参数量超十亿的多项式大语言模型,规模较先前模型提升逾十倍。所得模型展现出与同规模标准Transformer相当的推理与上下文学习能力,标志着该领域的重大突破。最后,我们详细剖析了加密数据上各项计算的延迟分布,为后续优化奠定基础,并探讨了依赖同态加密友好变体的模型与标准Transformer在归纳偏置上的差异。