Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Finally, we study the approximation of stationary points for the empirical loss in approximate-DP optimization and obtain rates that depend on sparsity instead of dimension, modulo polylogarithmic factors.
翻译:受大型嵌入模型应用的启发,我们研究了在个体梯度稀疏性条件下的差分隐私优化问题。首先针对经典均值估计问题,但处理稀疏数据时,我们提出了新的接近最优的边界,特别是在高维场景下改进了现有算法。在此基础上,我们针对具有稀疏梯度的随机凸优化问题,分别设计了纯差分隐私和近似差分隐私算法,其收敛速率几乎达到最优;前者是该问题首个几乎与维度无关的收敛速率。最后,我们研究了近似差分隐私优化中经验损失驻点的逼近问题,并得到了在多项式对数因子范围内依赖于稀疏性而非维度的收敛速率。