We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. Let $\mathcal{F}$ be a distribution-valued function class with unbounded label set. We aim at estimating an \emph{unknown} function $f\in \mathcal{F}$ in an online fashion so that at time $t$ when the context $\boldsymbol{x}_t$ is provided we can generate an estimate of $f(\boldsymbol{x}_t)$ under KL-divergence knowing only a privatized version of the true labels sampling from $f(\boldsymbol{x}_t)$. The ultimate objective is to minimize the cumulative KL-risk of a finite horizon $T$. We show that under $(\epsilon,0)$-local differential privacy of the privatized labels, the KL-risk grows as $\tilde{\Theta}(\frac{1}{\epsilon}\sqrt{KT})$ upto poly-logarithmic factors where $K=|\mathcal{F}|$. This is in stark contrast to the $\tilde{\Theta}(\sqrt{T\log K})$ bound demonstrated by Wu et al. (2023a) for bounded label sets. As a byproduct, our results recover a nearly tight upper bound for the hypothesis selection problem of gopi et al. (2020) established only for the batch setting.
翻译:我们研究了在局部差分隐私下,标签集无界的在线条件分布估计问题。令 $\mathcal{F}$ 为具有无界标签集的分布值函数类。我们旨在以在线方式估计未知函数 $f\in \mathcal{F}$,使得在时刻 $t$ 给定上下文 $\boldsymbol{x}_t$ 时,能够基于从 $f(\boldsymbol{x}_t)$ 采样得到的真实标签的私有化版本,生成 $f(\boldsymbol{x}_t)$ 在 KL 散度下的估计值。最终目标是最小化有限时间范围 $T$ 内的累积 KL 风险。我们证明,在私有化标签满足 $(\epsilon,0)$-局部差分隐私的条件下,KL 风险增长为 $\tilde{\Theta}(\frac{1}{\epsilon}\sqrt{KT})$(忽略多对数因子),其中 $K=|\mathcal{F}|$。这与 Wu 等人 (2023a) 针对有界标签集所给出的 $\tilde{\Theta}(\sqrt{T\log K})$ 界限形成鲜明对比。作为副产品,我们的结果恢复了 Gopi 等人 (2020) 仅在批处理设置中建立的假设选择问题的近乎紧的上界。