Private data analysis faces a significant challenge known as the curse of dimensionality, leading to increased costs. However, many datasets possess an inherent low-dimensional structure. For instance, during optimization via gradient descent, the gradients frequently reside near a low-dimensional subspace. If the low-dimensional structure could be privately identified using a small amount of points, we could avoid paying for the high ambient dimension. On the negative side, Dwork, Talwar, Thakurta, and Zhang (STOC 2014) proved that privately estimating subspaces, in general, requires an amount of points that has a polynomial dependency on the dimension. However, their bound do not rule out the possibility to reduce the number of points for "easy'' instances. Yet, providing a measure that captures how much a given dataset is "easy'' for this task turns out to be challenging, and was not properly addressed in prior works. Inspired by the work of Singhal and Steinke (NeurIPS 2021), we provide the first measures that quantify easiness as a function of multiplicative singular-value gaps in the input dataset, and support them with new upper and lower bounds. In particular, our results determine the first type of gap that is sufficient and necessary for estimating a subspace with an amount of points that is independent of the dimension. Furthermore, we realize our upper bounds using a practical algorithm and demonstrate its advantage in high-dimensional regimes compared to prior approaches.
翻译:隐私数据分析面临一个被称为维度灾难的重大挑战,导致成本增加。然而,许多数据集具有固有的低维结构。例如,在通过梯度下降进行优化时,梯度经常位于一个低维子空间附近。如果能够使用少量点来私有地识别低维结构,我们就可以避免为高环境维度付出代价。从负面来看,Dwork、Talwar、Thakurta和Zhang(STOC 2014)证明了在一般情况下,私有估计子空间所需的点数与维度呈多项式依赖关系。然而,他们的界限并未排除为"简单"实例减少点数的可能性。然而,提供一个能够捕捉给定数据集对此任务"简单"程度的度量标准被证明是具有挑战性的,并且在先前的工作中未得到妥善解决。受Singhal和Steinke(NeurIPS 2021)工作的启发,我们提供了首个以输入数据集中乘法奇异值间隙为函数的易度度量,并通过新的上界和下界结果予以支持。特别地,我们的结果确定了第一种足以且必要用于以独立于维度的点数估计子空间的间隙类型。此外,我们通过一种实用算法实现了我们的上界,并展示了其在高维环境下相较于先前方法的优势。