The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not even able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice. In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error $R$ of the approximation algorithm is sufficiently concentrated around $0$ (e.g.\ $\mathbb{E}[|R|]$ is bounded) and that the function being approximated has a small global sensitivity $\Delta$. Specifically, if we have a randomized approximation algorithm with sufficiently concentrated error which has time/space/query complexity $T(n,\rho)$ with $\rho$ being an accuracy parameter, we can generally speaking get an algorithm with the same accuracy and complexity $T(n,\Theta(\epsilon \rho))$ that is $\epsilon$-differentially private.
翻译:可用数据量的指数级增长使得在不侵犯用户隐私的前提下利用这些数据成为计算机科学的基本问题之一。现有文献已在差分隐私框架下对此问题进行了深入研究,但大多数研究未聚焦于数据量极大时(如流式处理、亚线性时间处理等场景)无法在非隐私环境下计算精确答案的情况。这往往导致差分隐私在实践中难以应用。本文提出了一种通用方法,使蒙特卡洛随机近似算法具备差分隐私能力。我们仅需假设近似算法的误差$R$充分集中在$0$附近(例如$\mathbb{E}[|R|]$有界),且被近似函数具有较小的全局敏感度$\Delta$。具体而言,若随机近似算法具有充分集中的误差,且其时间/空间/查询复杂度为$T(n,\rho)$(其中$\rho$为精度参数),则我们通常可得到具有相同精度与复杂度$T(n,\Theta(\epsilon \rho))$的$\epsilon$-差分隐私算法。