Machine learning and data systems increasingly function as infrastructures of memory: they ingest, store, and operationalize traces of personal, political, and cultural life. Yet contemporary governance demands credible forms of forgetting, from GDPR-backed deletion to harm-mitigation and the removal of manipulative content, while technical infrastructures are optimized to retain, replicate, and reuse. This work argues that "forgetting" in computational systems cannot be reduced to a single operation (e.g., record deletion) and should instead be treated as a sociotechnical practice with distinct mechanisms and consequences. We clarify a vocabulary that separates erasure (removing or disabling access to data artifacts), unlearning (interventions that bound or remove a data point influence on learned parameters and outputs), exclusion (upstream non-collection and omission), and forgetting as an umbrella term spanning agency, temporality, reversibility, and scale. Building on examples from machine unlearning, semantic dependencies in data management, participatory data modeling, and manipulation at scale, we show how forgetting can simultaneously protect rights and enable silencing. We propose reframing unlearning as a first-class capability in knowledge infrastructures, evaluated not only by compliance or utility retention, but by its governance properties: transparency, accountability, and epistemic justice.
翻译:机器学习和数据系统日益成为记忆的基础设施:它们摄取、存储并运作个人、政治和文化生活的痕迹。然而,当代治理要求可信的遗忘形式——从GDPR支持的数据删除到危害缓解和操纵性内容的移除,而技术基础设施却为保留、复制和重用而优化。本文认为,计算系统中的“遗忘”不能简化为单一操作(如记录删除),而应被视为具有独特机制和后果的社会技术实践。我们厘清了一套术语体系:擦除(移除或禁用对数据制品的访问)、反学习(限制或消除数据点对学习参数及输出影响的干预)、排除(上游的非收集与省略),以及作为涵盖能动性、时间性、可逆性与规模等维度的总括概念——遗忘。基于机器反学习、数据管理中的语义依赖、参与式数据建模及规模化操纵等案例,我们展示了遗忘如何同时保护权利并助长消声。我们建议将反学习重构为知识基础设施的一级能力,其评估不仅基于合规性或效用保持,更应关注其治理属性:透明度、问责制与认知正义。