Transparency and accountability are indispensable principles for modern data protection, from both, legal and technical viewpoints. Regulations such as the GDPR, therefore, require specific transparency information to be provided including, e.g., purpose specifications, storage periods, or legal bases for personal data processing. However, it has repeatedly been shown that all too often, this information is practically hidden in legalese privacy policies, hindering data subjects from exercising their rights. This paper presents a novel approach to enable large-scale transparency information analysis across service providers, leveraging machine-readable formats and graph data science methods. More specifically, we propose a general approach for building a transparency analysis platform (TAP) that is used to identify data transfers empirically, provide evidence-based analyses of sharing clusters of more than 70 real-world data controllers, or even to simulate network dynamics using synthetic transparency information for large-scale data-sharing scenarios. We provide the general approach for advanced transparency information analysis, an open source architecture and implementation in the form of a queryable analysis platform, and versatile analysis examples. These contributions pave the way for more transparent data processing for data subjects, and evidence-based enforcement processes for data protection authorities. Future work can build upon our contributions to gain more insights into so-far hidden data-sharing practices.
翻译:透明度与问责制是现代数据保护中不可或缺的原则,既涉及法律层面,也涉及技术层面。因此,GDPR等法规要求提供具体的透明度信息,包括处理目的说明、存储期限以及个人数据处理的合法依据等。然而,一再有研究表明,这些信息往往被隐藏在晦涩的法律用语隐私政策中,阻碍数据主体行使其权利。本文提出了一种新颖的方法,利用机器可读格式和图数据科学技术,支持跨服务提供者的大规模透明度信息分析。具体而言,我们提出了构建透明度分析平台(TAP)的通用方法,该平台可凭经验识别数据传输,对70多个真实世界数据控制者的共享集群进行基于证据的分析,甚至可利用合成透明度信息模拟大规模数据共享场景中的网络动态。我们提供了高级透明度信息分析的通用方法、以可查询分析平台形式呈现的开源架构与实现,以及多样化的分析示例。这些贡献为数据主体实现更透明的数据处理以及数据保护机构开展基于证据的执法流程铺平了道路。未来研究可基于我们的贡献,进一步揭示迄今尚未公开的数据共享实践。