Despite proportionality being one of the tenets of data protection laws, we currently lack a robust analytical framework to evaluate the reach of modern data collections and the network effects at play. We here propose a graph-theoretic model and notions of node- and edge-observability to quantify the reach of networked data collections. We first prove closed-form expressions for our metrics and quantify the impact of the graph's structure on observability. Second, using our model, we quantify how (1) from 270,000 compromised accounts, Cambridge Analytica collected 68.0M Facebook profiles; (2) from surveilling 0.01\% the nodes in a mobile phone network, a law-enforcement agency could observe 18.6\% of all communications; and (3) an app installed on 1\% of smartphones could monitor the location of half of the London population through close proximity tracing. Better quantifying the reach of data collection mechanisms is essential to evaluate their proportionality.
翻译:尽管比例原则是数据保护法的核心原则之一,但当前我们缺乏稳健的分析框架来评估现代数据收集的影响范围及其中存在的网络效应。本文提出了一种基于图论的模型,并定义了节点可观测性与边可观测性的概念,以量化网络化数据收集的影响范围。首先,我们证明了这些指标的闭式表达式,并量化了图结构对可观测性的影响。其次,利用该模型,我们量化了以下案例:(1) 剑桥分析公司通过27万个被入侵账户收集了6800万Facebook用户资料;(2) 执法机构监控手机网络中0.01%的节点,即可观测到18.6%的全部通信;(3) 安装在1%智能手机上的应用程序通过近距离追踪,可监控伦敦一半人口的位置信息。更准确地量化数据收集机制的影响范围,对于评估其是否符合比例原则至关重要。