Big data scenarios, where massive, heterogeneous datasets are distributed across clients, demand scalable, privacy-preserving learning methods. Federated learning (FL) enables decentralized training of machine learning (ML) models across clients without data centralization. Decentralized training, however, introduces a computational burden on client devices. U-shaped federated split learning (UFSL) offloads a fraction of the client computation to the server while keeping both data and labels on the clients' side. However, the intermediate representations (i.e., smashed data) shared by clients with the server are prone to exposing clients' private data. To reduce exposure of client data through intermediate data representations, this work proposes k-anonymous differentially private UFSL (KD-UFSL), which leverages privacy-enhancing techniques such as microaggregation and differential privacy to minimize data leakage from the smashed data transferred to the server. We first demonstrate that an adversary can access private client data from intermediate representations via a data-reconstruction attack, and then present a privacy-enhancing solution, KD-UFSL, to mitigate this risk. Our experiments indicate that, alongside increasing the mean squared error between the actual and reconstructed images by up to 50% in some cases, KD-UFSL also decreases the structural similarity between them by up to 40% on four benchmarking datasets. More importantly, KD-UFSL improves privacy while preserving the utility of the global model. This highlights its suitability for large-scale big data applications where privacy and utility must be balanced.
翻译:在大数据场景中,海量异构数据集分布于各客户端,需要可扩展且保护隐私的学习方法。联邦学习(FL)支持跨客户端的机器学习(ML)模型去中心化训练,无需数据集中化。然而,去中心化训练给客户端设备带来了计算负担。U形联邦分割学习(UFSL)将部分客户端计算任务卸载至服务器,同时将数据和标签保留在客户端。但是,客户端与服务器共享的中间表示(即粉碎数据)容易暴露客户端的私有数据。为减少通过中间数据表示暴露客户端数据的风险,本研究提出k匿名差分隐私UFSL(KD-UFSL),该方案利用微聚合与差分隐私等隐私增强技术,最小化传输至服务器的粉碎数据所导致的数据泄露。我们首先证明攻击者可通过数据重构攻击从中间表示中获取客户端私有数据,随后提出隐私增强解决方案KD-UFSL以降低此风险。实验表明,在四个基准数据集上,KD-UFSL不仅在某些情况下将实际图像与重构图像之间的均方误差提升达50%,还将二者的结构相似度降低达40%。更重要的是,KD-UFSL在提升隐私保护能力的同时保持了全局模型的实用性。这凸显了其在需要平衡隐私与实用性的大规模大数据应用中的适用性。