Given a database of bit strings $A_1,\ldots,A_m\in \{0,1\}^n$, a fundamental data structure task is to estimate the distances between a given query $B\in \{0,1\}^n$ with all the strings in the database. In addition, one might further want to ensure the integrity of the database by releasing these distance statistics in a secure manner. In this work, we propose differentially private (DP) data structures for this type of tasks, with a focus on Hamming and edit distance. On top of the strong privacy guarantees, our data structures are also time- and space-efficient. In particular, our data structure is $\epsilon$-DP against any sequence of queries of arbitrary length, and for any query $B$ such that the maximum distance to any string in the database is at most $k$, we output $m$ distance estimates. Moreover, - For Hamming distance, our data structure answers any query in $\widetilde O(mk+n)$ time and each estimate deviates from the true distance by at most $\widetilde O(k/e^{\epsilon/\log k})$; - For edit distance, our data structure answers any query in $\widetilde O(mk^2+n)$ time and each estimate deviates from the true distance by at most $\widetilde O(k/e^{\epsilon/(\log k \log n)})$. For moderate $k$, both data structures support sublinear query operations. We obtain these results via a novel adaptation of the randomized response technique as a bit flipping procedure, applied to the sketched strings.
翻译:给定一个由比特字符串 $A_1,\ldots,A_m\in \{0,1\}^n$ 组成的数据库,一项基本的数据结构任务是估计给定查询 $B\in \{0,1\}^n$ 与数据库中所有字符串之间的距离。此外,人们可能还希望通过以安全的方式发布这些距离统计信息来确保数据库的完整性。在本工作中,我们为此类任务提出了差分隐私(DP)数据结构,重点关注汉明距离和编辑距离。除了强大的隐私保证外,我们的数据结构在时间和空间上也是高效的。具体而言,我们的数据结构对于任意长度的查询序列是 $\epsilon$-DP 的,并且对于任意查询 $B$,只要其与数据库中任意字符串的最大距离不超过 $k$,我们输出 $m$ 个距离估计值。此外,- 对于汉明距离,我们的数据结构在 $\widetilde O(mk+n)$ 时间内回答任意查询,且每个估计值与真实距离的偏差不超过 $\widetilde O(k/e^{\epsilon/\log k})$;- 对于编辑距离,我们的数据结构在 $\widetilde O(mk^2+n)$ 时间内回答任意查询,且每个估计值与真实距离的偏差不超过 $\widetilde O(k/e^{\epsilon/(\log k \log n)})$。对于适中的 $k$,两种数据结构均支持亚线性查询操作。我们通过将随机响应技术作为一种比特翻转过程进行新颖的改编,并将其应用于草图字符串,从而获得了这些结果。