We address the problem of efficiently and effectively answering large numbers of queries on a sensitive dataset while ensuring differential privacy (DP). We separately analyze this problem in two distinct settings, grounding our work in a state-of-the-art DP mechanism for large-scale query answering: the Relaxed Adaptive Projection (RAP) mechanism. The first setting is a classic setting in DP literature where all queries are known to the mechanism in advance. Within this setting, we identify challenges in the RAP mechanism's original analysis, then overcome them with an enhanced implementation and analysis. We then extend the capabilities of the RAP mechanism to be able to answer a more general and powerful class of queries (r-of-k thresholds) than previously considered. Empirically evaluating this class, we find that the mechanism is able to answer orders of magnitude larger sets of queries than prior works, and does so quickly and with high utility. We then define a second setting motivated by real-world considerations and whose definition is inspired by work in the field of machine learning. In this new setting, a mechanism is only given partial knowledge of queries that will be posed in the future, and it is expected to answer these future-posed queries with high utility. We formally define this setting and how to measure a mechanism's utility within it. We then comprehensively empirically evaluate the RAP mechanism's utility within this new setting. From this evaluation, we find that even with weak partial knowledge of the future queries that will be posed, the mechanism is able to efficiently and effectively answer arbitrary queries posed in the future. Taken together, the results from these two settings advance the state of the art on differentially private large-scale query answering.
翻译:我们研究了在确保差分隐私(DP)的前提下,高效且有效地回答敏感数据集上的大量查询问题。我们在两种不同场景中分别分析这一问题,并以一种用于大规模查询回答的最先进DP机制——松弛自适应投影(RAP)机制为基础展开工作。第一种场景是DP文献中的经典场景,所有查询都预先已知。在此场景下,我们识别了RAP机制原始分析中的挑战,并通过增强的实现和分析克服了这些挑战。随后,我们将RAP机制的能力扩展到能够回答比以往考虑的更通用、更强大的查询类别(r-of-k阈值)。通过实验评估此类查询,我们发现该机制能够回答比先前工作数量级更大的查询集,并且速度快、效用高。接着,我们定义了第二种场景,该场景源于现实考量,其灵感来自机器学习领域的工作。在这个新场景中,机制仅获知未来可能提出的查询的部分知识,并需要以高效用回答这些未来提出的查询。我们正式定义了该场景及测量机制效用性的方法,随后在该新场景下全面实验评估了RAP机制的效用性。从评估中我们发现,即使仅对未来提出的查询掌握微弱的部分知识,该机制仍能高效且有效地回答任意未来查询。综合这两个场景的结果,我们推动了差分隐私大规模查询回答的最新技术水平。