Profile Reconstruction from Private Sketches

Given a multiset of $n$ items from $\mathcal{D}$, the \emph{profile reconstruction} problem is to estimate, for $t = 0, 1, \dots, n$, the fraction $\vec{f}[t]$ of items in $\mathcal{D}$ that appear exactly $t$ times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of $\vec{f} = (\vec{f}[0], \dots, \vec{f}[n])$. Using a histogram privatized using discrete Laplace noise, we show how to ``reverse'' the noise, using an approach of Dwork et al.~(ITCS '10). We show how to speed up their LP-based technique from polynomial time to $O(d + n \log n)$, where $d = |\mathcal{D}|$, and analyze the achievable error in the $\ell_1$, $\ell_2$ and $\ell_\infty$ norms. In all cases the dependency of the error on $d$ is $O( 1 / \sqrt{d})$ -- we give an information-theoretic lower bound showing that this dependence on $d$ is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee.

翻译：给定定义域 $\mathcal{D}$ 上的一个包含 $n$ 个元素的多重集合，\emph{分布轮廓重构}问题旨在估计对于 $t = 0, 1, \dots, n$，定义域 $\mathcal{D}$ 中恰好出现 $t$ 次的元素所占的比例 $\vec{f}[t]$。我们研究在分布式、空间受限场景下的差分隐私轮廓估计问题，目标是维护一个可更新的、私有的多重集合草图，该草图允许我们计算 $\vec{f} = (\vec{f}[0], \dots, \vec{f}[n])$ 的近似值。通过使用经离散拉普拉斯噪声隐私化的直方图，我们展示了如何利用 Dwork 等人（ITCS '10）的方法来“逆转”噪声。我们将其基于线性规划的技术从多项式时间加速至 $O(d + n \log n)$，其中 $d = |\mathcal{D}|$，并分析了在 $\ell_1$、$\ell_2$ 和 $\ell_\infty$ 范数下可达到的误差。在所有情况下，误差对 $d$ 的依赖为 $O( 1 / \sqrt{d})$——我们给出了一个信息论下界，证明对于具有高概率误差保证的分布轮廓重构问题，在所有私有、可更新的草图中，这种对 $d$ 的依赖是渐近最优的。

相关内容

多重集

关注 0

在数学中，多重集是对集的概念的修改，与集不同，集对每个元素允许多个实例。为每个元素提供的实例的正整数个数称为该元素在多重集中的多重性。结果存在无限多个多重集，它们仅包含元素a和b，但因元素的多样性而变化：（1）集{a，b}仅包含元素a和b，当将{a，b}视为多集时，每个元素的多重性为1;（2）在多重集{a，a，b}中，元素a具有多重性2，而b具有多重性1;（3）在多集{a，a，a，b，b，b}中，a和b都具有多重性3。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日