Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in shaping the technology landscape. By synthesizing related literature that reconceptualizes the production of data for computing as ``data labor'', we outline opportunities for researchers, policymakers, and activists to empower data producers in their relationship with tech companies, e.g advocating for transparency about data reuse, creating feedback channels between data producers and companies, and potentially developing mechanisms to share data's revenue more broadly. In doing so, we characterize data labor with six important dimensions - legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap - based on the parallels between data labor and various other types of labor in the computing literature.
翻译:许多近期的技术进步(例如ChatGPT和搜索引擎)之所以成为可能,仅源于用户与计算系统交互过程中产生的海量用户生成数据,或从网络抓取的数据(如行为日志、用户生成内容和艺术作品)。然而,数据生产者对于哪些数据被捕获、如何使用以及利益归属几乎没有话语权。具备数据访问和处理能力的组织(如OpenAI和谷歌)在塑造技术格局方面拥有巨大权力。通过综合将用于计算的数据生产重新概念化为"数据劳动"的相关文献,我们为研究者、政策制定者和活动家勾勒出赋能数据生产者在与科技公司关系中增强权力的机会——例如倡导数据重用的透明度、建立数据生产者与公司之间的反馈渠道,以及开发更广泛分享数据收入的潜在机制。在此过程中,我们基于数据劳动与计算文献中多种其他类型劳动之间的相似性,从六个关键维度——可解释性、最终用途认知、协作需求、开放性、可替代性及生计重叠度——对数据劳动进行表征。