Secondary use of growing real-world data (RWD) in education offers significant opportunities for research, yet privacy practices intended to enable third-party access to such RWD are rarely evaluated for their implications for downstream analyses. As a result, potential problems introduced by otherwise standard privacy practices may remain unnoticed. To address this gap, we investigate potential issues arising from common practices by assessing (1) the re-identification risk of fine-grained RWD, (2) how communicating such risks influences learners' privacy behaviour, and (3) the sensitivity of downstream analytical conclusions to resulting changes in the data. We focus on these practices because re-identification risk and stakeholder communication can jointly influence the data shared with third parties. We find that substantial re-identification risk in RWD, when communicated to stakeholders, can induce opt-outs and non-self-disclosure behaviours. Sensitivity analysis demonstrates that these behavioural changes can meaningfully alter the shared data, limiting validity of secondary-use findings. We conceptualise this phenomenon as the third-party access effect (3PAE) and discuss implications for trustworthy secondary use of educational RWD.
翻译:教育领域日益增长的真实世界数据的二次使用为研究提供了重要机遇,然而旨在实现第三方访问此类数据的隐私实践很少被评估其对下游分析的影响。因此,常规隐私实践可能引入的潜在问题往往未被察觉。为填补这一空白,我们通过评估以下方面来研究常见实践可能引发的问题:(1) 细粒度真实世界数据的再识别风险,(2) 此类风险沟通如何影响学习者的隐私行为,以及(3) 下游分析结论对数据最终变化的敏感性。我们聚焦这些实践,因为再识别风险与利益相关者沟通会共同影响第三方获得的数据。我们发现,当向利益相关者传达真实世界数据中存在的显著再识别风险时,会引发选择退出与非自我披露行为。敏感性分析表明,这些行为改变可能实质性地改变共享数据,从而限制二次使用研究结论的有效性。我们将此现象概念化为第三方访问效应,并讨论其对教育真实世界数据可信二次使用的启示。