"Does generative AI infringe copyright?" is an urgent question. It is also a difficult question, for two reasons. First, "generative AI" is not just one product from one company. It is a catch-all name for a massive ecosystem of loosely related technologies, including conversational text chatbots like ChatGPT, image generators like Midjourney and DALL-E, coding assistants like GitHub Copilot, and systems that compose music and create videos. These systems behave differently and raise different legal issues. The second problem is that copyright law is notoriously complicated, and generative-AI systems manage to touch on a great many corners of it: authorship, similarity, direct and indirect liability, fair use, and licensing, among much else. These issues cannot be analyzed in isolation, because there are connections everywhere. In this Article, we aim to bring order to the chaos. To do so, we introduce the generative-AI supply chain: an interconnected set of stages that transform training data (millions of pictures of cats) into generations (a new, potentially never-seen-before picture of a cat that has never existed). Breaking down generative AI into these constituent stages reveals all of the places at which companies and users make choices that have copyright consequences. It enables us to trace the effects of upstream technical designs on downstream uses, and to assess who in these complicated sociotechnical systems bears responsibility for infringement when it happens. Because we engage so closely with the technology of generative AI, we are able to shed more light on the copyright questions. We do not give definitive answers as to who should and should not be held liable. Instead, we identify the key decisions that courts will need to make as they grapple with these issues, and point out the consequences that would likely flow from different liability regimes.
翻译:“生成式AI是否侵犯版权?”是一个紧迫的问题,也是一个难以回答的问题,原因有二。首先,“生成式AI”并非单一公司的单一产品,而是对庞杂且松散关联的技术生态系统的统称,包括ChatGPT等对话式文本聊天机器人、Midjourney和DALL-E等图像生成器、GitHub Copilot等编程助手,以及创作音乐和视频的系统。这些系统行为各异,引发的法律问题也各不相同。第二个问题是,版权法以其复杂性著称,而生成式AI系统几乎触及了其所有角落:作者身份、相似性、直接与间接责任、合理使用、许可等。这些问题无法孤立分析,因为它们彼此关联。在本文中,我们旨在为这一混乱局面建立秩序。为此,我们引入了生成式AI供应链:一系列相互关联的阶段,将训练数据(数百万张猫的图片)转化为生成物(一张全新的、可能从未见过的猫的图片)。将生成式AI分解为这些构成阶段,揭示了公司和用户在哪些环节做出的选择会带来版权后果。这使我们能够追踪上游技术设计对下游使用的影响,并评估在这些复杂的社会技术系统中,当侵权发生时,谁应承担责任。由于我们深入涉及生成式AI的技术层面,因此能够为版权问题提供更多启示。我们并未给出谁应该或不应该承担责任的最终答案,而是指出了法院在处理这些问题时需要做出的关键决策,并说明了不同责任机制可能产生的后果。