With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as large language models, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.
翻译:随着ChatGPT等各类人工智能工具日益普及,我们正迈入真正的人工智能时代。可以预见,卓越的人工智能工具很快将获得可观利润。一个关键问题随之浮现:除了传统利益相关者和股东外,人工智能工具是否也应与其训练数据的提供者分享收益?答案是肯定的。大型人工智能工具(如大语言模型)始终需要更多且更优质的数据以持续改进,但现行版权法限制了其对各类数据的获取。在人工智能工具与其数据提供者之间建立收益共享机制,可将当前人工智能工具与大多数受版权保护数据所有者之间对抗性的零和博弈关系,转变为协作共赢的互利关系——这是推动人工智能技术、用户与数据提供者之间形成良性循环的必要条件,从而促进人工智能技术发展并构建健康的人工智能生态系统。然而,在即将到来的人工智能时代,现有的收益共享商业模式并不适用于人工智能工具,因为基于网站流量和交互行为的最常用指标(如点击量)将被生成式人工智能工具的新指标(如提示次数和每次提示成本)所取代。一种全新的收益共享商业模式(必须几乎独立于人工智能工具且易于向数据提供者解释)需要建立基于提示的评分系统,以衡量每位数据提供者的数据参与度。本文系统论述了如何基于分类模型和内容相似度模型为人工智能工具的所有数据提供者构建此类评分系统,并概述了人工智能工具或第三方建立该系统的必要条件。利用此类评分系统与数据提供者分享收益,将鼓励更多数据所有者参与收益共享计划。这将是所有参与方共赢的功利主义人工智能时代。