Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via maximum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.
翻译:暂无翻译