In a large-scale distributed machine learning system, coded computing has attracted wide-spread attention since it can effectively alleviate the impact of stragglers. However, several emerging problems greatly limit the performance of coded distributed systems. Firstly, an existence of colluding workers who collude results with each other leads to serious privacy leakage issues. Secondly, there are few existing works considering security issues in data transmission of distributed computing systems. Thirdly, the number of required results for which need to wait increases with the degree of decoding functions. In this paper, we design a secure and private approximated coded distributed computing (SPACDC) scheme that deals with the above-mentioned problems simultaneously. Our SPACDC scheme guarantees data security during the transmission process using a new encryption algorithm based on elliptic curve cryptography. Especially, the SPACDC scheme does not impose strict constraints on the minimum number of results required to be waited for. An extensive performance analysis is conducted to demonstrate the effectiveness of our SPACDC scheme. Furthermore, we present a secure and private distributed learning algorithm based on the SPACDC scheme, which can provide information-theoretic privacy protection for training data. Our experiments show that the SPACDC-based deep learning algorithm achieves a significant speedup over the baseline approaches.
翻译:在大规模分布式机器学习系统中,编码计算因其能有效缓解掉队者影响而受到广泛关注。然而,若干新问题严重限制了编码分布式系统的性能:首先,存在相互串通的合谋工作者会导致严重的隐私泄露问题;其次,现有工作极少考虑分布式计算系统中数据传输的安全性;第三,所需等待的结果数量随解码函数阶数增加而增加。本文设计了一种同时应对上述问题的安全隐私近似编码分布式计算(SPACDC)方案。该方案采用基于椭圆曲线密码学的新型加密算法,保障传输过程中的数据安全。特别地,SPACDC方案对需要等待的最小结果数量不施加严格约束。通过全面的性能分析验证了SPACDC方案的有效性。此外,我们基于SPACDC方案提出了安全隐私分布式学习算法,可为训练数据提供信息论意义上的隐私保护。实验表明,基于SPACDC的深度学习算法相较基准方法实现了显著加速。