The success of machine learning (ML) applications relies on vast datasets and distributed architectures, which, as they grow, present challenges for ML. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the costs associated with achieving these objectives in distributed architectures. We overview the meanings of privacy and robustness in distributed ML, and clarify how they can be achieved efficiently in isolation. However, we contend that the integration of these objectives entails a notable compromise in computational efficiency. We delve into this intricate balance, exploring the challenges and solutions for privacy, robustness, and computational efficiency in ML applications.
翻译:机器学习应用的成功依赖于海量数据集和分布式架构,但随着这些要素的扩展,也给机器学习带来了挑战。在现实场景中,数据往往包含敏感信息,数据投毒和硬件故障等问题十分常见。确保隐私性和稳健性对于机器学习在公共生活中的广泛应用至关重要。本文探讨了在分布式架构中实现这些目标所需的成本。我们概述了分布式机器学习中隐私性和稳健性的含义,并阐明了如何分别高效地实现这些目标。然而,我们认为,整合这些目标会显著牺牲计算效率。我们深入探讨了这一复杂的平衡问题,探究了机器学习应用中隐私性、稳健性与计算效率面临的挑战及解决方案。