Who Owns the Knowledge? Copyright, GenAI, and the Future of Academic Publishing

from arxiv, The second version version substantially revises the original preprint through expanded legal analysis, representation of the new technical standard (RSL 1.0), and removing substantial material lacking direct relevance to copyright and AI training

The integration of generative artificial intelligence (GenAI) and large language models (LLMs) into scientific research and higher education presents a paradigm shift, offering revolutionizing opportunities while simultaneously raising profound ethical, legal, and regulatory questions. This study examines the complex intersection of AI and science, with a specific focus on the challenges posed to copyright law and the principles of open science. The author argues that current regulatory frameworks in key jurisdictions like the United States, China, the European Union, and the United Kingdom, while aiming to foster innovation, contain significant gaps, particularly concerning the use of copyrighted works and open science outputs for AI training. Widely adopted licensing mechanisms, such as Creative Commons, fail to adequately address the nuances of AI training, and the pervasive lack of attribution within AI systems fundamentally challenges established notions of originality. While current doctrine treats AI training as potentially fair use, this paper argues such mechanisms are inadequate and that copyright holders should retain explicit opt-out rights regardless of fair use doctrine. Instead, the author advocates for upholding authors' rights to refuse the use of their works for AI training and proposes that universities assume a leading role in shaping responsible AI governance. The conclusion is that a harmonized international legislative effort is urgently needed to ensure transparency, protect intellectual property, and prevent the emergence of an oligopolistic market structure that could prioritize commercial profit over scientific integrity and equitable knowledge production. This is a substantially expanded and revised version of a work originally presented at the 20th International Conference on Scientometrics & Informetrics (Kochetkov, 2025).

翻译：生成式人工智能（GenAI）与大型语言模型（LLMs）融入科学研究和高等教育，带来了一场范式转变，既提供了革命性的机遇，同时也引发了深刻的伦理、法律和监管问题。本研究审视了人工智能与科学之间复杂的交汇点，特别聚焦于其对版权法和开放科学原则构成的挑战。作者认为，美国、中国、欧盟和英国等关键司法管辖区现行的监管框架，虽旨在促进创新，却存在显著漏洞，尤其是在将受版权保护的作品和开放科学成果用于人工智能训练方面。广泛采用的许可机制，如知识共享（Creative Commons），未能充分应对人工智能训练的细微差别，而人工智能系统中普遍存在的缺乏署名现象，从根本上挑战了既有的原创性概念。尽管现行法律原则将人工智能训练视为潜在的合理使用，但本文认为此类机制并不充分，版权持有者应保留明确的退出权，而不论合理使用原则如何。相反，作者主张维护作者拒绝将其作品用于人工智能训练的权利，并提议大学应在塑造负责任的人工智能治理中发挥主导作用。结论是，迫切需要一项协调的国际立法努力，以确保透明度、保护知识产权，并防止出现一种可能将商业利润置于科学诚信和公平知识生产之上的寡头垄断市场结构。本文是在第20届国际科学计量学与信息计量学会议（Kochetkov, 2025）上首次发表作品的实质性扩展和修订版。