Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two servers that train and evaluate the model on the joint data. A significant source of inefficiency and inaccuracy in existing methods arises from using Yao's garbled circuits to compute non-linear activation functions. We propose new methods for computing non-linear functions based on secret-shared lookup tables, offering both computational efficiency and improved accuracy. Beyond introducing leakage-free techniques, we initiate the exploration of relaxed security measures for privacy-preserving machine learning. Instead of claiming that the servers gain no knowledge during the computation, we contend that while some information is revealed about access patterns to lookup tables, it maintains epsilon-dX-privacy. Leveraging this relaxation significantly reduces the computational resources needed for training. We present new cryptographic protocols tailored to this relaxed security paradigm and define and analyze the leakage. Our evaluations show that our logistic regression protocol is up to 9x faster, and the neural network training is up to 688x faster than SecureML. Notably, our neural network achieves an accuracy of 96.6% on MNIST in 15 epochs, outperforming prior benchmarks that capped at 93.4% using the same architecture.
翻译:在多个实体间训练机器学习模型,而无需直接共享数据,可解锁因商业、法律或伦理约束而受阻的应用。本研究为逻辑回归和神经网络模型设计并实现了新的隐私保护机器学习协议。我们采用双服务器模型,数据拥有者将数据秘密共享至两台服务器,由后者在联合数据上训练和评估模型。现有方法中低效与不准确的主要来源在于使用姚氏混淆电路计算非线性激活函数。我们提出了基于秘密共享查找表计算非线性函数的新方法,兼具计算效率与精度提升。在引入无泄漏技术的基础上,我们首次探索了针对隐私保护机器学习的松弛安全度量。我们并非声称服务器在计算过程中毫无所知,而是主张:尽管查找表的访问模式会泄露部分信息,该机制仍能维持ε-dX-隐私。利用这种松弛可大幅降低训练所需计算资源。我们提出了适配这种松弛安全范式的新型密码协议,并定义和分析了信息泄漏。评估表明,相较于SecureML,我们的逻辑回归协议速度提升高达9倍,神经网络训练速度提升高达688倍。值得注意的是,我们的神经网络在MNIST数据集上以15个epoch达到96.6%的准确率,显著超越采用相同架构的先前基准(上限为93.4%)。