As data-driven methods are deployed in real-world settings, the processes that generate the observed data will often react to the decisions of the learner. For example, a data source may have some incentive for the algorithm to provide a particular label (e.g. approve a bank loan), and manipulate their features accordingly. Work in strategic classification and decision-dependent distributions seeks to characterize the closed-loop behavior of deploying learning algorithms by explicitly considering the effect of the classifier on the underlying data distribution. More recently, works in performative prediction seek to classify the closed-loop behavior by considering general properties of the mapping from classifier to data distribution, rather than an explicit form. Building on this notion, we analyze repeated risk minimization as the perturbed trajectories of the gradient flows of performative risk minimization. We consider the case where there may be multiple local minimizers of performative risk, motivated by situations where the initial conditions may have significant impact on the long-term behavior of the system. We provide sufficient conditions to characterize the region of attraction for the various equilibria in this settings. Additionally, we introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
翻译:随着数据驱动方法在实际环境中的部署,生成观测数据的过程往往会响应学习者的决策。例如,数据源可能出于让算法提供特定标签(如批准银行贷款)的动机,而相应调整其特征。策略性分类和决策依赖分布领域的工作通过显式考虑分类器对底层数据分布的影响,旨在刻画学习算法部署的闭环行为。近期,执行性预测领域的工作通过研究从分类器到数据分布的映射的通用性质(而非显式形式)来对闭环行为进行分类。基于这一概念,我们将重复风险最小化分析为执行性风险最小化梯度流的扰动轨迹。我们考虑执行性风险可能存在多个局部极小值的情况——这源于初始条件可能对系统长期行为产生显著影响的场景。我们给出了刻画该场景下各平衡点吸引域的充分条件。此外,我们引入了执行性对齐的概念,该概念为重复风险最小化收敛至执行性风险最小化提供了几何条件。