Fingerprinting Codes Meet Geometry: Improved Lower Bounds for Private Query Release and Adaptive Data Analysis

Fingerprinting codes are a crucial tool for proving lower bounds in differential privacy. They have been used to prove tight lower bounds for several fundamental questions, especially in the ``low accuracy'' regime. Unlike reconstruction/discrepancy approaches however, they are more suited for query sets that arise naturally from the fingerprinting codes construction. In this work, we propose a general framework for proving fingerprinting type lower bounds, that allows us to tailor the technique to the geometry of the query set. Our approach allows us to prove several new results, including the following. First, we show that any (sample- and population-)accurate algorithm for answering $Q$ arbitrary adaptive counting queries over a universe $\mathcal{X}$ to accuracy $\alpha$ needs $\Omega(\frac{\sqrt{\log |\mathcal{X}|}\cdot \log Q}{\alpha^3})$ samples, matching known upper bounds. This shows that the approaches based on differential privacy are optimal for this question, and improves significantly on the previously known lower bounds of $\frac{\log Q}{\alpha^2}$ and $\min(\sqrt{Q}, \sqrt{\log |\mathcal{X}|})/\alpha^2$. Second, we show that any $(\varepsilon,\delta)$-DP algorithm for answering $Q$ counting queries to accuracy $\alpha$ needs $\Omega(\frac{\sqrt{ \log|\mathcal{X}| \log(1/\delta)} \log Q}{\varepsilon\alpha^2})$ samples, matching known upper bounds up to constants. Our framework allows for proving this bound via a direct correlation analysis and improves the prior bound of [BUV'14] by $\sqrt{\log(1/\delta)}$. Third, we characterize the sample complexity of answering a set of random $0$-$1$ queries under approximate differential privacy. We give new upper and lower bounds in different regimes. By combining them with known results, we can complete the whole picture.

翻译：指纹识别码是差分隐私下界证明的关键工具，已被用于证明若干基础问题的紧致下界，尤其在“低精度”场景中。然而，与重构/差异方法不同，指纹识别码更适用于从其构造自然产生的查询集。本文提出一个证明指纹识别类型下界的通用框架，使我们能够根据查询集的几何特性定制该技术。我们的方法能够证明若干新结果，包括以下内容：首先，我们证明任何（样本与总体）精确的算法，若要在精度α下回答定义域𝒳上任意Q个自适应计数查询，需要Ω(√log|𝒳|·log Q/α³)个样本，这与已知上界匹配。这表明基于差分隐私的方法对该问题是最优的，并显著改进了先前已知的log Q/α²和min(√Q, √log|𝒳|)/α²下界。其次，我们证明任何(ε,δ)-差分隐私算法要在精度α下回答Q个计数查询，需要Ω(√log|𝒳|·log(1/δ)·log Q/(εα²))个样本，与已知上界在常数因子内匹配。我们的框架通过直接相关性分析证明该下界，并将[BUV'14]的先前下界改进了√log(1/δ)因子。第三，我们刻画了在近似差分隐私下回答一组随机0-1查询的样本复杂度。我们在不同参数范围内给出新的上界与下界，结合已知结果可完善该问题的完整理论图景。