Recent years have seen increasing concerns about the private inference of NLP services and Transformer models. However, existing two-party privacy-preserving methods solely consider NLU scenarios, while the private inference of text generation such as translation, dialogue, and code completion remains unsolved. Besides, while migrated to NLG models, existing privacy-preserving methods perform poorly in terms of inference speed, and suffer from the convergence problem during the training stage. To address these issues, we propose MERGE, a fast private text generation framework for Transformer-based language models. Specifically, MERGE reuse the output hidden state as the word embedding to bypass the embedding computation, and reorganize the linear operations in the Transformer module to accelerate the forward procedure. Based on these two optimizations, extensive experiments show that MERGE can achieve a 26.5x speedup under the sequence length 512, and reduce 80\% communication bytes, with an up to 10x speedup to existing state-of-art models.
翻译:近年来,人们对自然语言处理服务和Transformer模型的私有推理日益关注。然而,现有的两方隐私保护方法仅考虑自然语言理解场景,而翻译、对话和代码补全等文本生成的私有推理问题仍未解决。此外,当迁移至自然语言生成模型时,现有隐私保护方法在推理速度方面表现不佳,且面临训练阶段的收敛问题。为解决这些问题,我们提出MERGE,一种面向基于Transformer的语言模型的快速私有文本生成框架。具体而言,MERGE重用输出隐藏状态作为词嵌入以绕过嵌入计算,并重组Transformer模块中的线性操作以加速前向过程。基于这两项优化,大量实验表明,在序列长度为512时,MERGE可实现26.5倍加速,并减少80%的通信字节数,相较于现有最先进模型可实现高达10倍加速。