We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.
翻译:我们提出了一种新颖的分析框架,用于刻画现实系统中分布偏移的动态过程,该框架捕捉了学习算法与其部署环境分布之间的反馈回路。现有研究大多将反馈引发的分布偏移建模为对抗性形式,或采用过度简化的分布偏移结构。与此不同,我们提出了一类耦合偏微分方程模型,通过纳入因算法决策的战略性响应、非局域内生种群相互作用以及其他外生分布偏移源而产生的复杂动力学特征,能够刻画分布随时间演变的精细变化。我们考虑了机器学习中两类常见场景:存在信息不对称的合作场景,以及学习者面对战略用户时的竞争场景。对于这两种场景,当算法通过梯度下降进行再训练时,我们证明了再训练过程在有限维和无限维空间中渐近收敛到稳态,并获得了关于模型参数的显式收敛率。为此,我们推导了耦合偏微分方程收敛性的新结果,拓展了多物种系统的已知结论。实验表明,我们的方法能够捕捉到极化和差异性影响等已有文献充分记载的分布偏移形式,而这些是简单模型无法刻画的。