We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.
翻译:我们提出了一种新颖的框架,用于分析现实系统中分布偏移的动态过程,该框架捕捉了学习算法与其部署的数据分布之间的反馈循环。现有研究大多将反馈引起的分布偏移建模为对抗性情形,或采用过于简化的分布偏移结构。相比之下,我们提出一个耦合偏微分方程模型,通过考虑算法决策引发的策略性响应、非局部内生种群相互作用以及其他外生分布偏移源所产生的复杂动态,精细刻画分布随时间的变化。我们考虑机器学习中的两种常见场景:存在信息不对称的合作场景,以及学习器面对策略性用户的竞争场景。对于这两种场景,当算法通过梯度下降进行重训练时,我们证明了重训练过程在有限维和无限维空间均渐近收敛至稳态,并获得了基于模型参数的显式收敛速率。为此,我们推导了耦合偏微分方程收敛性的新结果,扩展了多物种系统的现有理论。实验表明,我们的方法能够捕捉极化效应、差异性影响等文献中已有记录的分布偏移形式,而这些是更简单模型无法捕捉的。