Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain

"Does generative AI infringe copyright?" is an urgent question. It is also a difficult question, for two reasons. First, "generative AI" is not just one product from one company. It is a catch-all name for a massive ecosystem of loosely related technologies, including conversational text chatbots like ChatGPT, image generators like Midjourney and DALL-E, coding assistants like GitHub Copilot, and systems that compose music and create videos. These systems behave differently and raise different legal issues. The second problem is that copyright law is notoriously complicated, and generative-AI systems manage to touch on a great many corners of it: authorship, similarity, direct and indirect liability, fair use, and licensing, among much else. These issues cannot be analyzed in isolation, because there are connections everywhere. In this Article, we aim to bring order to the chaos. To do so, we introduce the generative-AI supply chain: an interconnected set of stages that transform training data (millions of pictures of cats) into generations (a new, potentially never-seen-before picture of a cat that has never existed). Breaking down generative AI into these constituent stages reveals all of the places at which companies and users make choices that have copyright consequences. It enables us to trace the effects of upstream technical designs on downstream uses, and to assess who in these complicated sociotechnical systems bears responsibility for infringement when it happens. Because we engage so closely with the technology of generative AI, we are able to shed more light on the copyright questions. We do not give definitive answers as to who should and should not be held liable. Instead, we identify the key decisions that courts will need to make as they grapple with these issues, and point out the consequences that would likely flow from different liability regimes.

翻译：“生成式AI是否侵犯版权？”是一个紧迫的问题，也是一个难以回答的问题，原因有二。首先，“生成式AI”并非单一公司的单一产品，而是对庞杂且松散关联的技术生态系统的统称，包括ChatGPT等对话式文本聊天机器人、Midjourney和DALL-E等图像生成器、GitHub Copilot等编程助手，以及创作音乐和视频的系统。这些系统行为各异，引发的法律问题也各不相同。第二个问题是，版权法以其复杂性著称，而生成式AI系统几乎触及了其所有角落：作者身份、相似性、直接与间接责任、合理使用、许可等。这些问题无法孤立分析，因为它们彼此关联。在本文中，我们旨在为这一混乱局面建立秩序。为此，我们引入了生成式AI供应链：一系列相互关联的阶段，将训练数据（数百万张猫的图片）转化为生成物（一张全新的、可能从未见过的猫的图片）。将生成式AI分解为这些构成阶段，揭示了公司和用户在哪些环节做出的选择会带来版权后果。这使我们能够追踪上游技术设计对下游使用的影响，并评估在这些复杂的社会技术系统中，当侵权发生时，谁应承担责任。由于我们深入涉及生成式AI的技术层面，因此能够为版权问题提供更多启示。我们并未给出谁应该或不应该承担责任的最终答案，而是指出了法院在处理这些问题时需要做出的关键决策，并说明了不同责任机制可能产生的后果。

相关内容

生成式人工智能

关注 38

生成式人工智能是利用复杂的算法、模型和规则，从大规模数据集中学习，以创造新的原创内容的人工智能技术。这项技术能够创造文本、图片、声音、视频和代码等多种类型的内容，全面超越了传统软件的数据处理和分析能力。2022年末，OpenAI推出的ChatGPT标志着这一技术在文本生成领域取得了显著进展，2023年被称为生成式人工智能的突破之年。这项技术从单一的语言生成逐步向多模态、具身化快速发展。在图像生成方面，生成系统在解释提示和生成逼真输出方面取得了显著的进步。同时，视频和音频的生成技术也在迅速发展，这为虚拟现实和元宇宙的实现提供了新的途径。生成式人工智能技术在各行业、各领域都具有广泛的应用前景。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日