We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.
翻译:我们认为,机器学习价值链在结构上是不可持续的,这源于一种经济数据处理不等式:从输入数据到模型权重再到合成输出的数据循环中,每个阶段虽然提升了技术信号的质量,却剥夺了数据生成者的经济权益。通过对七十三项公开数据协议的分析,我们发现价值主要流向数据聚合方,有记录的内容创作者版税近乎为零,且协议条款普遍缺乏透明度。这不仅是经济福利问题:随着数据及其衍生品成为经济资产,维持当前学习算法的反馈循环正面临风险。我们识别出三个结构性缺陷——溯源机制缺失、议价能力不对等和非动态定价机制——这些构成了不平等现象的运行机制。在分析中,我们沿着机器学习价值链追溯这些问题,并提出"公平数据价值交换"框架,以构建能使所有参与者受益的最小化市场。最后,我们规划了具体的研究方向,供学术共同体在数据协议领域作出实质性贡献,并通过相关及正交观点对我们的立场进行语境化阐释。