The ethical and legal imperative to share research data without causing harm requires careful attention to privacy risks. While mounting evidence demonstrates that data sharing benefits science, legitimate concerns persist regarding the potential leakage of personal information that could lead to reidentification and subsequent harm. We reviewed metadata accompanying neuroimaging datasets from heterogeneous studies openly available on OpenNeuro, involving participants across the lifespan, from children to older adults, with and without clinical diagnoses, and including associated clinical score data. Using metaprivBIDS (https://github.com/CPernet/metaprivBIDS), a software application for BIDS compliant tsv/json files that computes and reports different privacy metrics (k-anonymity, k-global, l-diversity, SUDA, PIF), we found that privacy is generally well maintained, with serious vulnerabilities being rare. Nonetheless, issues were identified in nearly all datasets and warrant mitigation. Notably, clinical score data (e.g., neuropsychological results) posed minimal reidentification risk, whereas demographic variables: age, sex assigned at birth, sexual orientations, race, income, and geolocation, represented the principal privacy vulnerabilities. We outline practical measures to address these risks, enabling safer data sharing practices.
翻译:在不造成伤害的前提下共享研究数据的伦理和法律要求,需要审慎关注隐私风险。尽管越来越多的证据表明数据共享有益于科学,但对于可能导致重新识别及后续伤害的个人信息潜在泄露,仍存在合理的担忧。我们审查了OpenNeuro上公开可用的异质性研究神经影像数据集所附带的元数据,这些数据涉及从儿童到老年人的全年龄段参与者,包括有临床诊断和无临床诊断的个体,并包含相关的临床评分数据。通过使用metaprivBIDS(https://github.com/CPernet/metaprivBIDS)——一款针对符合BIDS标准的tsv/json文件、可计算并报告不同隐私度量指标(k-匿名性、k-全局性、l-多样性、SUDA、PIF)的软件应用——我们发现隐私总体上得到了良好维护,严重漏洞较为罕见。尽管如此,几乎所有数据集都识别出了需要缓解的问题。值得注意的是,临床评分数据(如神经心理学结果)带来的重新识别风险极低,而人口统计学变量:年龄、出生时指定的性别、性取向、种族、收入和地理位置,构成了主要的隐私漏洞。我们概述了应对这些风险的实际措施,以实现更安全的数据共享实践。