Differential privacy (DP) is a formal privacy framework that enables training machine learning (ML) models while protecting individuals' data. As pointed out by prior work, ML models are part of larger systems, which can lead to so-called privacy side-channels even if the model training itself is DP. We identify the output label space of a classification model as such a privacy side-channel and show a concrete privacy attack that exploits it. The side-channel becomes highly relevant in continual learning (CL), where the output label space changes over time. To reason about privacy guarantees in CL, we introduce a formalisation of DP for CL, which also clarifies how our approach differs from existing approaches. We propose and evaluate two methods for eliminating this side-channel: applying an optimal DP mechanism to release the labels in the sensitive data, and using a large public label space. We explore the trade-offs of these methods through adapting pre-trained models. We demonstrate empirically that our models consistently achieve higher accuracy under DP than previous work over both Split-CIFAR-100 and Split-ImageNet-R, with a stronger privacy model.
翻译:差分隐私(DP)是一种形式化的隐私保护框架,能够在保护个体数据的同时训练机器学习(ML)模型。正如先前工作所指出的,ML模型是更大系统的一部分,即使模型训练本身满足差分隐私,也可能导致所谓的隐私侧信道。我们识别出分类模型的输出标签空间正是这样一种隐私侧信道,并展示了一种利用该侧信道的具体隐私攻击。在持续学习(CL)场景中,由于输出标签空间会随时间变化,该侧信道变得尤为重要。为了分析CL中的隐私保障,我们引入了适用于CL的差分隐私形式化定义,这也阐明了我们的方法如何与现有方法区分开来。我们提出并评估两种消除该侧信道的方法:对敏感数据中的标签施加最优差分隐私机制,以及使用大型公开标签空间。我们通过适配预训练模型来探索这两种方法的权衡。实验表明,在Split-CIFAR-100和Split-ImageNet-R数据集上,与先前工作相比,我们的模型在更强的隐私模型下始终能实现更高的差分隐私准确率。