Recent years have seen increasing concerns about the private inference of NLP services and Transformer models. However, existing two-party privacy-preserving methods solely consider NLU scenarios, while the private inference of text generation such as translation, dialogue, and code completion remains unsolved. Besides, while migrated to NLG models, existing privacy-preserving methods perform poorly in terms of inference speed, and suffer from the convergence problem during the training stage. To address these issues, we propose MERGE, a fast private text generation framework for Transformer-based language models. Specifically, MERGE reuse the output hidden state as the word embedding to bypass the embedding computation, and reorganize the linear operations in the Transformer module to accelerate the forward procedure. Based on these two optimizations, extensive experiments show that MERGE can achieve a 26.5x speedup under the sequence length 512, and reduce 80\% communication bytes, with an up to 10x speedup to existing state-of-art models.
翻译:近年来,NLP服务与Transformer模型的私有推理问题日益受到关注。然而,现有两方隐私保护方法仅考虑NLU场景,而文本生成(如翻译、对话和代码补全)的私有推理问题仍未解决。此外,当迁移至NLG模型时,现有隐私保护方法在推理速度方面表现不佳,并在训练阶段面临收敛问题。为解决这些问题,我们提出MERGE——一种基于Transformer语言模型的快速私有文本生成框架。具体而言,MERGE复用输出隐藏状态作为词嵌入以绕过嵌入计算,并重组Transformer模块中的线性运算以加速前向过程。基于这两项优化,大量实验表明,在序列长度为512时,MERGE可实现26.5倍加速,减少80%通信字节,相较现有最优模型最高可实现10倍加速。