Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via minimum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.
翻译:差分隐私(DP)能够实现安全的数据发布,近年来合成数据生成已成为一种常见方法。然而,标准合成器保留了数据中的所有依赖关系,包括敏感属性与结果之间的虚假相关性。在公平性至关重要的场景中,这会重现不希望的偏差。一种原则性的补救措施是强制执行条件独立性(CI)约束,这些约束编码了领域知识或法律要求,即一旦考虑了可接受的因素,结果应与敏感属性独立。DP合成通常分两个阶段进行:(i)测量步骤,对选定的边际分布进行隐私化处理,通常通过最小生成树(MST)进行结构化;(ii)重建步骤,拟合一个与噪声边际分布一致的概率模型。我们提出了PrivCI,该方法在测量步骤中通过一种CI感知的贪心MST算法强制执行CI,该算法在指数机制下将可行性检查集成到Kruskal构建过程中,从而提高了相较于竞争方法的准确性。在标准公平性基准测试上的实验表明,PrivCI在满足指定CI约束的同时,比现有基线方法实现了更强的保真度和预测准确性。