Hawk: Accurate and Fast Privacy-Preserving Machine Learning Using Secure Lookup Table Computation

Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two servers that train and evaluate the model on the joint data. A significant source of inefficiency and inaccuracy in existing methods arises from using Yao's garbled circuits to compute non-linear activation functions. We propose new methods for computing non-linear functions based on secret-shared lookup tables, offering both computational efficiency and improved accuracy. Beyond introducing leakage-free techniques, we initiate the exploration of relaxed security measures for privacy-preserving machine learning. Instead of claiming that the servers gain no knowledge during the computation, we contend that while some information is revealed about access patterns to lookup tables, it maintains epsilon-dX-privacy. Leveraging this relaxation significantly reduces the computational resources needed for training. We present new cryptographic protocols tailored to this relaxed security paradigm and define and analyze the leakage. Our evaluations show that our logistic regression protocol is up to 9x faster, and the neural network training is up to 688x faster than SecureML. Notably, our neural network achieves an accuracy of 96.6% on MNIST in 15 epochs, outperforming prior benchmarks that capped at 93.4% using the same architecture.

翻译：在多个实体间训练机器学习模型，而无需直接共享数据，可解锁因商业、法律或伦理约束而受阻的应用。本研究为逻辑回归和神经网络模型设计并实现了新的隐私保护机器学习协议。我们采用双服务器模型，数据拥有者将数据秘密共享至两台服务器，由后者在联合数据上训练和评估模型。现有方法中低效与不准确的主要来源在于使用姚氏混淆电路计算非线性激活函数。我们提出了基于秘密共享查找表计算非线性函数的新方法，兼具计算效率与精度提升。在引入无泄漏技术的基础上，我们首次探索了针对隐私保护机器学习的松弛安全度量。我们并非声称服务器在计算过程中毫无所知，而是主张：尽管查找表的访问模式会泄露部分信息，该机制仍能维持ε-dX-隐私。利用这种松弛可大幅降低训练所需计算资源。我们提出了适配这种松弛安全范式的新型密码协议，并定义和分析了信息泄漏。评估表明，相较于SecureML，我们的逻辑回归协议速度提升高达9倍，神经网络训练速度提升高达688倍。值得注意的是，我们的神经网络在MNIST数据集上以15个epoch达到96.6%的准确率，显著超越采用相同架构的先前基准（上限为93.4%）。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日