Reproducibility is the New Copyleft: Defining AGI-oriented Reproducible Builds

Copyleft, as implemented in licenses such as the GNU General Public License, was a legal hack that used copyright to guarantee user freedom by tying the availability of source code to every act of distribution. Its normative force rested on an implicit technical premise: that source code and object code stand in a well-defined, humanly auditable, and reproducible relationship. Large language models and, prospectively, Artificial General Intelligence (AGI) systems systematically violate this premise. The artifacts jointly required to reconstruct a model -- code, data, weights, hyperparameters, toolchain, and hardware configuration -- are each subject to independent legal, technical, and economic constraints that no current open-source framework fully resolves. Sufficiently capable AI systems can also rewrite licensed source into functionally equivalent derivatives stripped of their original obligations, a form of laundering against which copyleft has no effective defense. This paper argues that a functional analogue of copyleft for AGI must be grounded not in share-alike clauses over code, but in reproducible builds: a practice guaranteeing bit-exact reconstructability from declared inputs. We review the logic of copyleft, critically examine Maffulli's Second Liberation thesis according to which AI fulfills Stallman's dream, and show that the argument collapses unless AGI systems are themselves reproducible. Drawing on the Open Source AI Definition (OSAID), the Model Openness Framework (MOF), OpenMDW, and deterministic-inference research, we define seven requirements for AGI-oriented reproducible builds. We further argue that the Model Context Protocol (MCP) and analogous AI-to-AI coupling mechanisms constitute a new dynamic linking layer for which copyleft-style licensing is ill-suited, and that Masnick's "protocols, not platforms" framework offers a more promising governance template.

翻译：著佐权（Copyleft），如GNU通用公共许可证所实现的，是一种利用版权保障用户自由的法律策略，通过将源代码可用性与每一次分发行为绑定来实现其效力。其规范性力量基于一个隐含的技术前提：源代码与目标代码之间存在定义明确、可人工审计且可复现的关系。大型语言模型以及未来的通用人工智能（AGI）系统从根本上违背了这一前提。重建模型所需的协同产物——代码、数据、权重、超参数、工具链及硬件配置——各自受到独立的法律、技术及经济约束，而现有开源框架无法完全协调这些约束。足够强大的AI系统还能将被许可的源代码重写为功能等同的衍生作品，同时剥离其原始义务——这是一种著佐权无法有效防御的清洗行为。本文主张，面向AGI的著佐权功能等价物必须基于可复现构建（而非代码的相同方式共享条款）：该实践确保从声明的输入到比特级精确的可重建性。我们回顾了著佐权的逻辑，批判性审视了Maffulli的“第二解放论”——即AI实现了Stallman的梦想——并指出该论点在AGI系统不具备可复现性时必然失效。基于《开源AI定义》（OSAID）、《模型开放框架》（MOF）、OpenMDW及确定性推理研究，我们定义了面向AGI的可复现构建七项要求。我们进一步论证：模型上下文协议（MCP）及类似的AI间耦合机制构成了新的动态链接层，著佐权式许可对此并不适用；而Masnick的“协议而非平台”框架提供了更有前途的治理模板。