Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device machine learning allows us to avoid sharing raw data with a third-party server during inference. On-device models are typically less accurate when compared to their server counterparts due to the fact that (1) they typically only rely on a small set of on-device features and (2) they need to be small enough to run efficiently on end-user devices. Split Learning (SL) is a promising approach that can overcome these limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device, aiming to incorporate the private features. However, end-to-end training of such models requires exchanging gradients at the cut layer, which might encode private features or labels. In this paper, we provide insights into potential privacy risks associated with SL. Furthermore, we also investigate the effectiveness of various mitigation strategies. Our results indicate that the gradients significantly improve the attackers' effectiveness in all tested datasets reaching almost perfect reconstruction accuracy for some features. However, a small amount of differential privacy (DP) can effectively mitigate this risk without causing significant training degradation.
翻译:隐私保护机器学习(PPML)有助于我们训练和部署利用私有信息的模型。特别是,设备端机器学习使我们在推理过程中无需与第三方服务器共享原始数据。然而,与服务器端模型相比,设备端模型通常精度较低,原因在于:(1)它们通常仅依赖少量设备端特征;(2)它们需要足够小巧以在终端用户设备上高效运行。分割学习(SL)是一种有望克服这些局限性的方法。在SL中,大型机器学习模型被分为两部分,较大部分驻留在服务器端,较小区块位于设备端执行,旨在融入私有特征。然而,这类模型的端到端训练需要在切割层交换梯度,这可能编码私有特征或标签。本文深入分析了SL相关的潜在隐私风险,并进一步探究了多种缓解策略的有效性。结果表明,在所有测试数据集中,梯度显著提升了攻击者的效能,部分特征的重建准确率几乎达到完美。不过,少量差分隐私(DP)即可有效缓解此风险,且不会导致显著的训练性能下降。