Human data forms the backbone of machine learning. Data protection laws thus have strong bearing on how ML systems are governed. Given that most requirements in data protection laws accompany the processing of personal data, organizations have an incentive to keep their data out of legal scope. This makes the development and application of certain privacy-preserving techniques--data protection techniques--an important strategy for ML compliance. In this paper, we examine the impact of a rhetoric that deems data wrapped in these techniques as data that is "good-to-go". We show how their application in the development of ML systems--from private set intersection as part of dataset curation to homomorphic encryption and federated learning as part of model computation--can further support individual monitoring and data consolidation. With data accumulation at the core of how the ML pipeline is configured, we argue that data protection techniques are often instrumentalized in ways that support infrastructures of surveillance, rather than in ways that protect individuals associated with data. Finally, we propose technology and policy strategies to evaluate data protection techniques in light of the protections they actually confer. We conclude by highlighting the role that technologists might play in devising policies that combat surveillance ML technologies.
翻译:人类数据构成了机器学习的支柱,因此数据保护法律对机器学习系统的治理具有重大影响。鉴于数据保护法律中的大多数要求伴随个人数据处理而产生,组织有动机使其数据脱离法律管辖范围。这使得特定隐私保护技术——即数据保护技术——的开发与应用成为机器学习合规的重要策略。本文审视了一种将采用这些技术处理的数据视为"可直接使用"的修辞体系的影响。我们展示了这些技术在机器学习系统开发中的应用——从作为数据集构建环节的私有集合交集,到作为模型计算环节的同态加密与联邦学习——如何进一步助长个体监控与数据整合。鉴于数据积累处于机器学习流水线配置的核心位置,我们认为数据保护技术常被工具化,用以支持监控基础设施而非保护数据相关个体。最后,我们提出技术与政策策略,以评估数据保护技术实际提供的保护效果,并强调技术人员在制定遏制监控型机器学习技术的政策中可发挥的关键作用。