Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the development of increasingly advanced AI systems, including degradation of the digital commons and unemployment from automation. The rapid pace of AI development and deployment currently leaves little room for this power. Monopolized in the hands of private corporations, the development of the most capable foundation models has proceeded largely without public input. There is currently no implemented mechanism for ensuring that the economic value generated by such models is redistributed to account for their negative externalities. The citizens that have generated the data necessary to train models do not have input on how their data are to be used. In this work, we propose that a public data trust assert control over training data for foundation models. In particular, this trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment. First, we argue in detail for the existence of such a trust. We also discuss feasibility and potential risks. Second, we detail a number of ways for a data trust to incentivize model developers to use training data only from the trust. We propose a mix of verification mechanisms, potential regulatory action, and positive incentives. We conclude by highlighting other potential benefits of our proposed data trust and connecting our work to ongoing efforts in data and compute governance.
翻译:人工智能的民主化不仅意味着人们可以自由使用AI,更意味着人们能够集体决定AI的使用方式。特别是,需要集体决策权来纠正日益先进的AI系统发展所带来的负外部性,包括数字公地退化与自动化导致的失业问题。当前AI开发和部署的快速步伐几乎没有为这种权力留下空间。最强大的基础模型的开发被垄断在私营企业手中,基本未纳入公众意见。目前尚无已实施的机制来确保这些模型产生的经济价值能够被重新分配,以弥补其负外部性。生成训练数据所需的公民未能对其数据的使用方式发表意见。在此工作中,我们提出由一个公共数据信托机构对基础模型的训练数据行使控制权。具体而言,该信托应作为数字公地 scraping 互联网,向商业模型开发者提供授权,并从模型部署收入中抽取一定比例的分成。首先,我们详细论证了建立此类信托的必要性,并讨论了其可行性与潜在风险。其次,我们阐述了数据信托激励模型开发者仅从该信托获取训练数据的多种方式,提出了一种融合验证机制、潜在监管措施与正向激励的综合方案。最后,我们总结了所提数据信托的其他潜在优势,并将我们的工作与数据与计算治理领域的现有努力联系起来。