The open data ecosystem is susceptible to vulnerabilities due to disclosure risks. Though the datasets are anonymized during release, the prevalence of the release-and-forget model makes the data defenders blind to privacy issues arising after the dataset release. One such issue can be the disclosure risks in the presence of newly released datasets which may compromise the privacy of the data subjects of the anonymous open datasets. In this paper, we first examine some of these pitfalls through the examples we observed during a red teaming exercise and then envision other possible vulnerabilities in this context. We also discuss proactive risk monitoring, including developing a collection of highly susceptible open datasets and a visual analytic workflow that empowers data defenders towards undertaking dynamic risk calibration strategies.
翻译:开放数据生态系统因披露风险而存在脆弱性。尽管数据集在发布时已进行匿名化处理,但"发布即遗忘"模式的普遍存在,使得数据守护者无法察觉数据集发布后出现的隐私问题。其中一个典型问题是:当新数据集发布后,可能危及匿名开放数据中数据主体的隐私安全。本文首先通过红队演练中观察到的实例剖析部分此类隐患,进而推演该背景下可能存在的其他脆弱性。我们还探讨了主动风险监控方案,包括构建高敏感性开放数据集集合,以及支持数据守护者实施动态风险评估策略的可视化分析工作流。