The massive deployment of Machine Learning (ML) models raises serious concerns about data protection. Privacy-enhancing technologies (PETs) offer a promising first step, but hard challenges persist in achieving confidentiality and differential privacy in distributed learning. In this paper, we describe a novel, regulation-compliant data protection technique for the distributed training of ML models, applicable throughout the ML life cycle regardless of the underlying ML architecture. Designed from the data owner's perspective, our method protects both training data and ML model parameters by employing a protocol based on a quantized multi-hash data representation Hash-Comb combined with randomization. The hyper-parameters of our scheme can be shared using standard Secure Multi-Party computation protocols. Our experimental results demonstrate the robustness and accuracy-preserving properties of our approach.
翻译:机器学习模型的大规模部署引发了数据保护的严重关切。隐私增强技术提供了有前景的初步解决方案,但在分布式学习中实现机密性和差分隐私仍面临严峻挑战。本文提出一种新颖的、符合监管要求的数据保护技术,适用于机器学习模型的全生命周期分布式训练,且独立于底层机器学习架构。该方法从数据所有者视角出发,通过采用基于量化多哈希数据表示Hash-Comb结合随机化的协议,实现对训练数据和机器学习模型参数的双重保护。本方案的超参数可通过标准安全多方计算协议进行共享。实验结果表明,该方法具有鲁棒性和精度保持特性。