Two parties wish to collaborate on their datasets. However, before they reveal their datasets to each other, the parties want to have the guarantee that the collaboration would be fruitful. We look at this problem from the point of view of machine learning, where one party is promised an improvement on its prediction model by incorporating data from the other party. The parties would only wish to collaborate further if the updated model shows an improvement in accuracy. Before this is ascertained, the two parties would not want to disclose their models and datasets. In this work, we construct an interactive protocol for this problem based on the fully homomorphic encryption scheme over the Torus (TFHE) and label differential privacy, where the underlying machine learning model is a neural network. Label differential privacy is used to ensure that computations are not done entirely in the encrypted domain, which is a significant bottleneck for neural network training according to the current state-of-the-art FHE implementations. We prove the security of our scheme in the universal composability framework assuming honest-but-curious parties, but where one party may not have any expertise in labelling its initial dataset. Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with time many orders of magnitude faster than a protocol using entirely FHE operations.
翻译:两方希望对其数据集进行协作。然而,在向对方披露数据集之前,双方希望获得协作能够产生成效的保证。我们从机器学习角度审视该问题:一方通过引入另一方数据有望改进其预测模型,但仅当更新后的模型显示出准确率提升时,双方才会考虑进一步协作。在此验证完成之前,双方均不愿披露各自的模型和数据集。本文基于环面全同态加密方案(TFHE)与标签差分隐私构建了该问题的交互式协议,其中底层机器学习模型为神经网络。标签差分隐私用于确保计算无需完全在加密域中进行——根据当前最先进的全同态加密实现,这恰是神经网络训练的主要性能瓶颈。我们在通用可组合框架下证明了方案的安全性,假设参与方为"诚实但好奇"型,但允许其中一方在初始数据集标注方面缺乏专业知识。实验表明:与完全采用全同态加密操作的协议相比,本方案获取输出(即更新模型的准确率)的时间可实现数量级加速。