The rapid adoption of digital technologies has greatly increased the volume of real-world data (RWD) in education. While these data offer significant opportunities for advancing learning analytics (LA), secondary use for research is constrained by privacy concerns. Differentially private synthetic data generation is regarded as the gold-standard approach to sharing sensitive data, yet studies on the private synthesis of educational data remain very scarce and rely predominantly on large, low-dimensional open datasets. Educational RWD, however, are typically high-dimensional and small in sample size, leaving the potential of private synthesis underexplored. Moreover, because educational practice is inherently iterative, data sharing is continual rather than one-off, making a traditional one-shot synthesis approach suboptimal. To address these challenges, we propose the Cyclic Adaptive Private Synthesis (CAPS) framework and evaluate it on authentic RWD. By iteratively sharing RWD, CAPS not only fosters open science, but also offers rich opportunities of design-based research (DBR), thereby amplifying the impact of LA. Our case study using actual RWD demonstrates that CAPS outperforms a one-shot baseline while highlighting challenges that warrant further investigation. Overall, this work offers a crucial first step towards privacy-preserving sharing of educational RWD and expands the possibilities for open science and DBR in LA.
翻译:数字技术的迅速普及极大地增加了教育领域中真实世界数据(RWD)的规模。尽管这些数据为推进学习分析(LA)提供了重要机遇,但其在研究中的二次使用仍受隐私问题的限制。差分隐私合成数据生成被视为共享敏感数据的黄金标准方法,然而关于教育数据隐私合成的研究仍然非常匮乏,且主要依赖大规模、低维度的开放数据集。然而,教育领域的真实世界数据通常具有高维度和小样本量的特点,导致隐私合成技术的潜力尚未得到充分探索。此外,由于教育实践本质上是迭代的,数据共享是持续而非一次性的过程,这使得传统的一次性合成方法效果欠佳。为应对这些挑战,我们提出了循环自适应隐私合成(CAPS)框架,并在真实世界数据上对其进行了评估。通过迭代共享真实世界数据,CAPS不仅促进了开放科学,还为基于设计的研究(DBR)提供了丰富的机会,从而扩大了学习分析的影响力。我们使用真实世界数据进行的案例研究表明,CAPS优于一次性合成基线方法,同时揭示了值得进一步研究的挑战。总体而言,这项工作为教育领域真实世界数据的隐私保护共享迈出了关键的第一步,并拓展了学习分析中开放科学与基于设计的研究的可能性。