Causal-Privacy Audit Workflow for Synthetic and Distilled Data in Dropout Support

Synthetic and distilled student data are increasingly used to enable privacy-conscious learning analytics, yet their suitability for decision-facing institutional support remains uncertain. In dropout support, generated data must preserve not only predictive utility or distributional resemblance, but also the financial-status evidence used to guide advising, payment-plan assistance, and scholarship-related decisions. Method: This study introduces CaP-Eval, a decision-facing causal-privacy audit workflow for evaluating generated student data under a fixed estimand, timing-aware adjustment design, estimator set, and empirical privacy-governance screen. The workflow compares original, distilled, adversarial synthetic, statistical synthetic, and DPGNet privacy-oriented generated data on predictive utility, treatment-effect fidelity, robustness to alternative estimators, and local training-record proximity. Results: DPGNet and distilled data preserved the original financial-status treatment-effect structure more reliably than the adversarial and Gaussian Copula baselines. DPGNet preserved full direction and rank agreement across epsilon levels; epsilon = 10 produced the smallest non-original IPW and DML deviations, while epsilon = 1 and epsilon = 5 amplified several financial-status contrasts. Distilled data remained highly faithful but retained the strongest local training-record proximity signal. TabularGNet preserved qualitative directions with moderate attenuation, and Gaussian Copula compressed effect magnitudes. Conclusions: Predictive utility, privacy orientation, empirical disclosure signals, and causal fidelity diverged; generated student data require joint audits of direction, magnitude, overlap, and release-governance risk before decision use.

翻译：合成与蒸馏学生数据日益被用于实施隐私敏感的学习分析，但其对面向决策的院校支持是否适用仍不确定。在辍学支持场景中，生成数据不仅要保留预测效用或分布相似性，还必须保留用于指导咨询、资助计划及奖学金相关决策的财务状况证据。方法：本研究引入CaP-Eval——一个面向决策的因果隐私审计工作流，用于在固定估计量、时序感知调整设计、估计量集合及经验隐私治理筛查下评估生成的学生数据。该工作流比较原始数据、蒸馏数据、对抗性合成数据、统计合成数据及DPGNet隐私导向生成数据在预测效用、处理效应保真度、对替代估计量的鲁棒性及本地训练记录邻近性方面的表现。结果：DPGNet与蒸馏数据在保留原始财务状况处理效应结构方面优于对抗性基线和高斯连接函数基线。DPGNet在不同epsilon水平下完全保留了方向与秩一致性；epsilon=10时非原始IPW与DML偏差最小，而epsilon=1与epsilon=5放大了个别财务状况对比。蒸馏数据保持高度保真，但保留了最强的本地训练记录邻近性信号。TabularGNet在适度衰减下保留了定性方向，高斯连接函数则压缩了效应幅度。结论：预测效用、隐私导向、经验披露信号与因果保真度呈现分歧；生成的学生数据在决策使用前需对方向、幅度、重叠及发布治理风险进行联合审计。