We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm's learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collective goal. We investigate the consequences of this model in three fundamental learning-theoretic settings: the case of a nonparametric optimal learning algorithm, a parametric risk minimizer, and gradient-based optimization. In each setting, we come up with coordinated algorithmic strategies and characterize natural success criteria as a function of the collective's size. Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. Taken together, our theory and experiments broadly support the conclusion that algorithmic collectives of exceedingly small fractional size can exert significant control over a platform's learning algorithm.
翻译:本文对数字平台上部署机器学习算法的算法集体行动展开了系统性研究。我们提出了一个简明的理论模型,描述了一个群体如何与企业的学习算法进行交互。该群体汇集参与个体的数据,并通过指导参与者如何修改自身数据来执行算法策略,从而实现集体目标。我们在三个基础学习理论场景下考察了该模型的后果:非参数最优学习算法、参数风险最小化以及基于梯度的优化。针对每种场景,我们设计了协调的算法策略,并基于群体规模刻画了自然成功标准。为补充理论分析,我们在一个面向自由职业者的零工平台上,对涉及数万份简历的技能分类任务进行了系统性实验。通过超过两千次BERT类语言模型的训练实验,我们的实证观测与理论预测之间呈现出显著的一致性。综合理论与实验结果表明,即使群体规模占比极小,算法集体仍能对平台的学习算法施加显著控制。