With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as large language models, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.
翻译:随着ChatGPT等各种人工智能工具日益普及,我们正步入真正的人工智能时代。可以预见,卓越的人工智能工具将很快获得可观利润。一个关键问题随之产生:除了传统利益相关者和股东外,人工智能工具是否还应与其训练数据的提供者分享收入?答案是肯定的。大型人工智能工具(如大语言模型)始终需要更多、更高质量的数据来持续改进,但现行版权法限制了其对各类数据的获取。人工智能工具与数据提供者之间的收入分享,能将当前人工智能工具与大多数受版权保护数据所有者之间敌对的零和博弈关系,转化为协作共赢的关系——这对于推动人工智能工具、用户和数据提供者之间的良性循环发展至关重要,这种循环将促进人工智能技术前进并构建健康的人工智能生态系统。然而,在即将到来的人工智能时代,现行收入分享商业模式并不适用于人工智能工具,因为基于网站流量和操作的最广泛使用的指标(如点击量)将被生成式人工智能工具的新指标(如提示词和每次提示成本)所取代。一种全新的收入分享商业模式必须几乎独立于人工智能工具,并能向数据提供者清晰解释,需要建立基于提示词的评分系统来衡量每位数据提供者的数据参与度。本文系统探讨了如何基于分类模型和内容相似度模型为人工智能工具的所有数据提供者构建此类评分系统,并概述了人工智能工具或第三方构建该系统所需的条件。采用此类评分系统与数据提供者分享收入,将鼓励更多数据所有者参与收入分享计划。这将是所有各方均能受益的功利主义人工智能时代。