评估神经影像学中的元数据隐私 (Assessing metadata privacy in neuroimaging)

The ethical and legal imperative to share research data without causing harm requires careful attention to privacy risks. While mounting evidence demonstrates that data sharing benefits science, legitimate concerns persist regarding the potential leakage of personal information that could lead to reidentification and subsequent harm. We reviewed metadata accompanying neuroimaging datasets from heterogeneous studies openly available on OpenNeuro, involving participants across the lifespan, from children to older adults, with and without clinical diagnoses, and including associated clinical score data. Using metaprivBIDS (https://github.com/CPernet/metaprivBIDS), a software application for BIDS compliant tsv/json files that computes and reports different privacy metrics (k-anonymity, k-global, l-diversity, SUDA, PIF), we found that privacy is generally well maintained, with serious vulnerabilities being rare. Nonetheless, issues were identified in nearly all datasets and warrant mitigation. Notably, clinical score data (e.g., neuropsychological results) posed minimal reidentification risk, whereas demographic variables: age, sex assigned at birth, sexual orientations, race, income, and geolocation, represented the principal privacy vulnerabilities. We outline practical measures to address these risks, enabling safer data sharing practices.

翻译：在不造成伤害的前提下共享研究数据的伦理和法律要求，需要审慎关注隐私风险。尽管越来越多的证据表明数据共享有益于科学，但对于可能导致重新识别及后续伤害的个人信息潜在泄露，仍存在合理的担忧。我们审查了OpenNeuro上公开可用的异质性研究神经影像数据集所附带的元数据，这些数据涉及从儿童到老年人的全年龄段参与者，包括有临床诊断和无临床诊断的个体，并包含相关的临床评分数据。通过使用metaprivBIDS（https://github.com/CPernet/metaprivBIDS）——一款针对符合BIDS标准的tsv/json文件、可计算并报告不同隐私度量指标（k-匿名性、k-全局性、l-多样性、SUDA、PIF）的软件应用——我们发现隐私总体上得到了良好维护，严重漏洞较为罕见。尽管如此，几乎所有数据集都识别出了需要缓解的问题。值得注意的是，临床评分数据（如神经心理学结果）带来的重新识别风险极低，而人口统计学变量：年龄、出生时指定的性别、性取向、种族、收入和地理位置，构成了主要的隐私漏洞。我们概述了应对这些风险的实际措施，以实现更安全的数据共享实践。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

【CMU博士论文】评估算法系统的隐私性与问责性

专知会员服务

10+阅读 · 2025年5月27日

《图像数据隐藏技术综述》

专知会员服务

42+阅读 · 2023年3月26日

「机器学习中差分隐私」最新2022进展综述

专知会员服务

53+阅读 · 2022年9月9日

图数据上的隐私攻击与防御技术

专知会员服务

28+阅读 · 2022年4月28日