Differential privacy is the de facto standard for protecting privacy in a variety of applications. One of the key challenges is private data release, which is particularly relevant in scenarios where limited information about the desired statistics is available beforehand. Recent work has presented a differentially private data release algorithm that achieves optimal rates of order $n^{-1/d}$, with $n$ being the size of the dataset and $d$ being the dimension, for the worst-case error over all Lipschitz continuous statistics. This type of guarantee is desirable in many practical applications, as for instance it ensures that clusters present in the data are preserved. However, due to the "slow" rates, it is often infeasible in practice unless the dimension of the data is small. We demonstrate that these rates can be significantly improved to $n^{-1/s}$ when only guarantees over s-sparse Lipschitz continuous functions are required, or to $n^{-1/(s+1)}$ when the data lies on an unknown s-dimensional subspace, disregarding logarithmic factors. We therefore obtain practically meaningful rates for moderate constants $s$ which motivates future work on computationally efficient approximate algorithms for this~problem.
翻译:差分隐私是多种应用中保护隐私的事实标准。其关键挑战之一在于私有数据发布,尤其在预先仅掌握有限目标统计信息的情况下尤为重要。近期研究提出了一种差分隐私数据发布算法,该算法针对所有Lipschitz连续统计量的最坏情况误差,实现了阶数为$n^{-1/d}$的最优率(其中$n$为数据集规模,$d$为数据维度)。此类保证在诸多实际应用中具有重要价值,例如能确保数据中聚类结构得以保留。然而,由于该"缓慢"收敛速率,除非数据维度较小,否则在实际中往往难以实现。我们证明:当仅需保证对$s$-稀疏Lipschitz连续函数时,该速率可显著提升至$n^{-1/s}$;若数据位于未知的$s$维子空间上,忽略对数因子后速率可达$n^{-1/(s+1)}$。因此,对于适中常数$s$,我们获得了具有实际意义的收敛速率,这为未来针对该问题设计计算高效的近似算法奠定了基础。