Distributed online learning is gaining increased traction due to its unique ability to process large-scale datasets and streaming data. To address the growing public awareness and concern on privacy protection, plenty of private distributed online learning algorithms have been proposed, mostly based on differential privacy which has emerged as the ``gold standard" for privacy protection. However, these algorithms often face the dilemma of trading learning accuracy for privacy. By exploiting the unique characteristics of online learning, this paper proposes an approach that tackles the dilemma and ensures both differential privacy and learning accuracy in distributed online learning. More specifically, while ensuring a diminishing expected instantaneous regret, the approach can simultaneously ensure a finite cumulative privacy budget, even on the infinite time horizon. To cater for the fully distributed setting, we adopt the local differential-privacy framework which avoids the reliance on a trusted data curator, and hence, provides stronger protection than the classic ``centralized" (global) differential privacy. To the best of our knowledge, this is the first algorithm that successfully ensures both rigorous local differential privacy and learning accuracy. The effectiveness of the proposed algorithm is evaluated using machine learning tasks, including logistic regression on the ``Mushrooms" and ``Covtype" datasets and CNN based image classification on the ``MNIST" and ``CIFAR-10" datasets.
翻译:分布式在线学习因其处理大规模数据集和流式数据的独特能力而日益受到关注。为了应对公众对隐私保护日益增长的关注和担忧,人们提出了大量私有分布式在线学习算法,其中大多数基于差分隐私,该技术已成为隐私保护的"黄金标准"。然而,这些算法常面临学习精度与隐私保护之间的权衡困境。本文利用在线学习的独特特性,提出了一种解决该困境的方法,确保分布式在线学习既能实现差分隐私又能保证学习精度。具体而言,在保证递减的期望瞬时遗憾的同时,该方法还能在无限时间范围内保证有限累积隐私预算。为适应完全分布式场景,我们采用局部差分隐私框架,该框架避免了对可信数据管理器的依赖,因此比经典的"集中式"(全局)差分隐私提供更强的保护。据我们所知,这是首个同时确保严格局部差分隐私与学习精度的算法。通过机器学习任务评估了所提算法的有效性,包括在"Mushrooms"和"Covtype"数据集上进行逻辑回归,以及在"MNIST"和"CIFAR-10"数据集上基于CNN的图像分类。