There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including personalization and federated learning, it is crucial to perform well (i) in the few-shot setting, as obtaining large amounts of labeled data may be problematic; and (ii) on datasets from a wide variety of domains for use in various specialist settings. To understand under which conditions few-shot DP can be effective, we perform an exhaustive set of experiments that reveals how the accuracy and vulnerability to attack of few-shot DP image classification models are affected as the number of shots per class, privacy level, model architecture, downstream dataset, and subset of learnable parameters in the model vary. We show that to achieve DP accuracy on par with non-private models, the shots per class must be increased as the privacy level increases. We also show that learning parameter-efficient FiLM adapters under DP is competitive with learning just the final classifier layer or learning all of the network parameters. Finally, we evaluate DP federated learning systems and establish state-of-the-art performance on the challenging FLAIR benchmark.
翻译:近年来,在训练差分隐私模型方面取得了显著进展,这些模型能够达到接近最佳非隐私模型的准确率。典型的差分隐私模型先在大规模公开数据集上预训练,随后在相对较大且分布与预训练数据相似的私有下游数据集上进行微调。然而,在个性化学习和联邦学习等众多应用场景中,模型必须满足两个关键要求:(i)在小样本设定下表现良好,因为获取大量标注数据可能存在问题;(ii)在涵盖广泛领域的各种专业应用场景下表现优异。为探究小样本差分隐私有效的条件,我们开展了一系列详尽实验,揭示了每类样本数、隐私级别、模型架构、下游数据集及模型中可学习参数子集等因素如何影响小样本差分隐私图像分类模型的准确率与攻击脆弱性。实验表明,要使差分隐私模型达到与非隐私模型相当的准确性,每类样本数需随隐私级别提升而增加。此外,在差分隐私约束下,学习参数高效的FiLM适配器与仅学习最终分类层或学习全部网络参数的方法相比具有竞争力。最后,我们评估了联邦学习差分隐私系统,在具有挑战性的FLAIR基准测试上取得了最先进性能。