The World Wide Web, a ubiquitous source of information, serves as a primary resource for countless individuals, amassing a vast amount of data from global internet users. However, this online data, when scraped, indexed, and utilized for activities like web crawling, search engine indexing, and, notably, AI model training, often diverges from the original intent of its contributors. The ascent of Generative AI has accentuated concerns surrounding data privacy and copyright infringement. Regrettably, the web's current framework falls short in facilitating pivotal actions like consent withdrawal or data copyright claims. While some companies offer voluntary measures, such as crawler access restrictions, these often remain inaccessible to individual users. To empower online users to exercise their rights and enable companies to adhere to regulations, this paper introduces a user-controlled consent tagging framework for online data. It leverages the extensibility of HTTP and HTML in conjunction with the decentralized nature of distributed ledger technology. With this framework, users have the ability to tag their online data at the time of transmission, and subsequently, they can track and request the withdrawal of consent for their data from the data holders. A proof-of-concept system is implemented, demonstrating the feasibility of the framework. This work holds significant potential for contributing to the reinforcement of user consent, privacy, and copyright on the modern internet and lays the groundwork for future insights into creating a more responsible and user-centric web ecosystem.
翻译:万维网作为无处不在的信息来源,为无数个体提供主要资源,同时汇聚了全球互联网用户的海量数据。然而,这些在线数据在被爬取、索引并用于网页抓取、搜索引擎索引,尤其是AI模型训练等活动时,往往偏离了数据贡献者的原始意图。生成式AI的兴起加剧了对数据隐私与版权侵权的担忧。遗憾的是,当前网络框架在支持同意撤销或数据版权主张等关键行动方面存在不足。尽管部分公司提供自愿性措施(如爬虫访问限制),但这些措施往往对个体用户不可及。为赋权在线用户行使其权利,并帮助企业遵守法规,本文提出一种用户控制的在线数据同意标签框架。该框架结合HTTP和HTML的可扩展性与分布式账本技术的去中心化特性。借助此框架,用户可在数据传输时对数据进行标签标记,随后能够追踪并向数据持有者请求撤回数据同意。我们实现了一个概念验证系统,证明了该框架的可行性。这项工作对加强现代互联网中用户同意、隐私与版权保护具有重要潜力,并为未来构建更负责任、以用户为中心的网络生态奠定基础。