We study an information-theoretic privacy mechanism design problem, where an agent observes useful data $Y$ that is arbitrarily correlated with sensitive data $X$, and design disclosed data $U$ generated from $Y$ (the agent has no direct access to $X$). We introduce \emph{sparse point-wise privacy leakage}, a worst-case privacy criterion that enforces two simultaneous constraints for every disclosed symbol $u\in\mathcal{U}$: (i) $u$ may be correlated with at most $N$ realizations of $X$, and (ii) the total leakage toward those realizations is bounded. In the high-privacy regime, we use concepts from information geometry to obtain a local quadratic approximation of mutual information which measures utility between $U$ and $Y$. When the leakage matrix $P_{X|Y}$ is invertible, this approximation reduces the design problem to a sparse quadratic maximization, known as the Rayleigh-quotient problem, with an $\ell_0$ constraint. We further show that, for the approximated problem, one can without loss of optimality restrict attention to a binary released variable $U$ with a uniform distribution. For small alphabet sizes, the exact sparsity-constrained optimum can be computed via combinatorial support enumeration, which quickly becomes intractable as the dimension grows. For general dimensions, the resulting sparse Rayleigh-quotient maximization is NP-hard and closely related to sparse principal component analysis (PCA). We propose a convex semidefinite programming (SDP) relaxation that is solvable in polynomial time and provides a tractable surrogate for the NP-hard design, together with a simple rounding procedure to recover a feasible leakage direction. We also identify a sparsity threshold beyond which the sparse optimum saturates at the unconstrained spectral value and the SDP relaxation becomes tight.
翻译:我们研究一个信息论隐私机制设计问题,其中智能体观测到与敏感数据$X$任意相关的有用数据$Y$,并设计从$Y$生成的披露数据$U$(智能体无法直接访问$X$)。我们提出\emph{稀疏逐点隐私泄露},这是一种最坏情况隐私准则,对每个披露符号$u\in\mathcal{U}$同时施加两个约束:(i) $u$最多可与$N$个$X$的实现相关;(ii) 对这些实现的总泄露量有界。在高隐私机制下,我们利用信息几何的概念获得衡量$U$与$Y$之间效用的互信息局部二次逼近。当泄露矩阵$P_{X|Y}$可逆时,该逼近将设计问题简化为具有$\ell_0$约束的稀疏二次最大化问题,即瑞利商问题。我们进一步证明,对于逼近问题,最优解可无损地简化为具有均匀分布的二元发布变量$U$。对于小字母表规模,可通过组合支撑枚举计算精确的稀疏约束最优解,但随着维度增长会迅速变得难以处理。对于一般维度,所得稀疏瑞利商最大化问题是NP难问题,且与稀疏主成分分析(PCA)密切相关。我们提出一种可在多项式时间内求解的凸半定规划(SDP)松弛方法,为NP难设计问题提供了可处理的替代方案,并辅以简单的舍入程序来恢复可行的泄露方向。我们还确定了一个稀疏度阈值,超过该阈值后稀疏最优解将饱和于无约束谱值,且SDP松弛将变得紧致。