Distributed online learning is gaining increased traction due to its unique ability to process large-scale datasets and streaming data. To address the growing public awareness and concern on privacy protection, plenty of algorithms have been proposed to enable differential privacy in distributed online optimization and learning. However, these algorithms often face the dilemma of trading learning accuracy for privacy. By exploiting the unique characteristics of online learning, this paper proposes an approach that tackles the dilemma and ensures both differential privacy and learning accuracy in distributed online learning. More specifically, while ensuring a diminishing expected instantaneous regret, the approach can simultaneously ensure a finite cumulative privacy budget, even in the infinite time horizon. To cater for the fully distributed setting, we adopt the local differential-privacy framework, which avoids the reliance on a trusted data curator that is required in the classic "centralized" (global) differential-privacy framework. To the best of our knowledge, this is the first algorithm that successfully ensures both rigorous local differential privacy and learning accuracy. The effectiveness of the proposed algorithm is evaluated using machine learning tasks, including logistic regression on the the "mushrooms" datasets and CNN-based image classification on the "MNIST" and "CIFAR-10" datasets.
翻译:分布式在线学习因其处理大规模数据集和流数据的独特能力而日益受到关注。为应对公众对隐私保护日益增长的关注与担忧,已有大量算法被提出以实现分布式在线优化与学习中的差分隐私。然而,这些算法常面临以学习精度换取隐私的困境。本文通过利用在线学习的独特性质,提出了一种解决该困境的方法,在分布式在线学习中同时确保差分隐私与学习精度。具体而言,在保证期望瞬时遗憾递减的同时,该方法能确保累积隐私预算有限,即使在无限时间范围内亦如此。为适应完全分布式设置,我们采用本地差分隐私框架,避免了经典“集中式”(全局)差分隐私框架中对可信数据管理者的依赖。据我们所知,这是首个成功同时确保严格本地差分隐私与学习精度的算法。所提算法的有效性通过机器学习任务进行评估,包括在“mushrooms”数据集上的逻辑回归以及在“MNIST”和“CIFAR-10”数据集上基于CNN的图像分类。