Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via maximum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.
翻译:差分隐私(DP)能够实现安全的数据发布,其中合成数据生成已成为近年来的常用方法。然而,标准合成器会保留数据中的所有依赖关系,包括敏感属性与结果之间的伪相关性。在公平性关键场景中,这会重现不希望的偏差。一种原则性的解决方案是强制执行条件独立性(CI)约束,这些约束编码了领域知识或法律要求,即一旦考虑可接受因素,结果应与敏感属性相互独立。DP合成通常包含两个阶段:(i)测量步骤,对选定的边际分布进行隐私化处理,通常通过最大生成树(MST)进行结构化;(ii)重建步骤,拟合与噪声边际分布一致的概率模型。我们提出PrivCI方法,在测量步骤中通过CI感知的贪心MST算法强制执行CI约束,该算法在指数机制下将可行性检查集成到Kruskal构建过程中,相比现有方法提高了准确性。在标准公平性基准测试上的实验表明,PrivCI在满足指定CI约束的同时,比现有基线方法实现了更强的保真度和预测准确性。