Generative AI and the Digital Commons

Many generative foundation models (or GFMs) are trained on publicly available data and use public infrastructure, but 1) may degrade the "digital commons" that they depend on, and 2) do not have processes in place to return value captured to data producers and stakeholders. Existing conceptions of data rights and protection (focusing largely on individually-owned data and associated privacy concerns) and copyright or licensing-based models offer some instructive priors, but are ill-suited for the issues that may arise from models trained on commons-based data. We outline the risks posed by GFMs and why they are relevant to the digital commons, and propose numerous governance-based solutions that include investments in standardized dataset/model disclosure and other kinds of transparency when it comes to generative models' training and capabilities, consortia-based funding for monitoring/standards/auditing organizations, requirements or norms for GFM companies to contribute high quality data to the commons, and structures for shared ownership based on individual or community provision of fine-tuning data.

翻译：许多生成式基础模型（或称GFM）依赖公开可用数据和公共基础设施进行训练，但存在两大问题：1）可能会损害其所依赖的"数字公地"；2）缺乏将捕获的价值返还给数据生产者和利益相关方的流程。现有的数据权利与保护概念（主要聚焦于个人拥有数据及相关隐私问题）、基于版权或许可的模式虽提供了一些可借鉴的先例，但难以适用于基于公地数据训练的模型可能引发的问题。我们概述了GFM带来的风险及其与数字公地的关联性，并提出多项基于治理的解决方案，包括：对标准化数据集/模型披露及生成模型训练与能力方面的其他透明度要求进行投资；建立联盟式资助机制以支持监测/标准/审计组织；要求或规范GFM企业向公地贡献高质量数据；以及基于个人或社区提供的微调数据构建共享所有权结构。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【斯坦福HAI白皮书】关于更新国家人工智能研发战略规划的建议，Recommendations on Updating the National Artificial Intelligence Research and Development Strategic Plan

专知会员服务

43+阅读 · 2022年3月15日

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

专知会员服务

24+阅读 · 2022年3月15日