Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via maximum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.
翻译:差分隐私(DP)使得安全数据发布成为可能,近年来合成数据生成已成为一种常见方法。然而,标准合成器保留了数据中的所有依赖关系,包括敏感属性与结果之间的虚假相关性。在公平性至关重要的场景中,这会复制不必要的偏差。一个原理性的解决方案是实施条件独立性(CI)约束,该约束编码了领域知识或法律要求,即在考虑可接受因素后,结果应与敏感属性独立。DP合成通常分两个阶段进行:(i)测量步骤,对选定的边缘分布进行隐私化处理,通常通过最大生成树(MST)进行结构化;(ii)重建步骤,拟合与带噪边缘分布一致的概率模型。我们提出PrivCI,该方法在测量步骤中通过一种CI感知的贪心MST算法强制执行CI约束,该算法将可行性检查整合到指数机制下的Kruskal构造中,从而相比竞争方法提高了准确性。在标准公平性基准上的实验表明,PrivCI在满足指定CI约束的同时,相比先前基线方法实现了更强的保真度和预测精度。