Split Without a Leak: Reducing Privacy Leakage in Split Learning

The popularity of Deep Learning (DL) makes the privacy of sensitive data more imperative than ever. As a result, various privacy-preserving techniques have been implemented to preserve user data privacy in DL. Among various privacy-preserving techniques, collaborative learning techniques, such as Split Learning (SL) have been utilized to accelerate the learning and prediction process. Initially, SL was considered a promising approach to data privacy. However, subsequent research has demonstrated that SL is susceptible to many types of attacks and, therefore, it cannot serve as a privacy-preserving technique. Meanwhile, countermeasures using a combination of SL and encryption have also been introduced to achieve privacy-preserving deep learning. In this work, we propose a hybrid approach using SL and Homomorphic Encryption (HE). The idea behind it is that the client encrypts the activation map (the output of the split layer between the client and the server) before sending it to the server. Hence, during both forward and backward propagation, the server cannot reconstruct the client's input data from the intermediate activation map. This improvement is important as it reduces privacy leakage compared to other SL-based works, where the server can gain valuable information about the client's input. In addition, on the MIT-BIH dataset, our proposed hybrid approach using SL and HE yields faster training time (about 6 times) and significantly reduced communication overhead (almost 160 times) compared to other HE-based approaches, thereby offering improved privacy protection for sensitive data in DL.

翻译：深度学习（DL）的普及使敏感数据的隐私保护比以往任何时候都更加迫切。因此，各种隐私保护技术已被用于保护DL中的用户数据隐私。在众多隐私保护技术中，协作学习技术，例如拆分学习（SL），已被用来加速学习和预测过程。最初，SL被认为是保护数据隐私的一种有前景的方法。然而，后续研究证明SL易受多种类型的攻击，因此不能作为隐私保护技术。与此同时，结合SL与加密的对抗措施也被引入以实现隐私保护的深度学习。在本工作中，我们提出了一种结合SL与同态加密（HE）的混合方法。其核心思想是：客户端在将激活图（客户端与服务器之间拆分层的输出）发送给服务器之前对其进行加密。因此，在前向传播和反向传播过程中，服务器都无法从中间激活图中重构客户端的输入数据。这一改进至关重要，因为与基于SL的其他工作（其中服务器能获取客户输入的有价值信息）相比，它减少了隐私泄露。此外，在MIT-BIH数据集上，我们提出的SL与HE混合方法相较于其他基于HE的方法，能够实现更快的训练时间（约快6倍）并显著降低通信开销（约降低160倍），从而为DL中的敏感数据提供更优的隐私保护。