Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $Θ\left(\frac{k}{α^2\min \{\varepsilon^2,1\}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq α+ 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $Θ(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.

翻译：我们研究了在本地差分隐私约束下的假设选择问题。给定一个包含 $k$ 个分布的类别 $\mathcal{F}$ 以及来自未知分布 $h$ 的一组独立同分布样本，假设选择的目标是选取一个分布 $\hat{f}$，其与 $h$ 的总变差距离以高概率与 $\mathcal{F}$ 中的最佳分布相当。我们设计了一种 $\varepsilon$-本地差分私有（$\varepsilon$-LDP）算法，该算法使用 $Θ\left(\frac{k}{α^2\min \{\varepsilon^2,1\}}\right)$ 个样本，以保证以高概率满足 $d_{TV}(h,\hat{f})\leq α+ 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$。对于 $\varepsilon<1$，该样本复杂度是最优的，与 Gopi 等人（2020）的下界相匹配。此前已知的所有解决该问题的算法都需要 $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ 个样本才能工作。此外，我们的结果证明了交互性对于 $\varepsilon$-LDP 假设选择的重要性。具体而言，它打破了非交互式假设选择样本复杂度已知的 $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ 下界。我们的算法仅使用 $Θ(\log \log k)$ 轮交互就突破了这一障碍。为了证明我们的结果，我们定义了统计查询算法（SQA）的 \emph{关键查询} 概念，这可能具有独立的意义。非正式地说，如果一个 SQA 的成功仅依赖于其提出的少量查询的准确性，则称其使用了少量关键查询。随后，我们设计了一种使用更少关键查询的 LDP 算法。