As data-driven methods are deployed in real-world settings, the processes that generate the observed data will often react to the decisions of the learner. For example, a data source may have some incentive for the algorithm to provide a particular label (e.g. approve a bank loan), and manipulate their features accordingly. Work in strategic classification and decision-dependent distributions seeks to characterize the closed-loop behavior of deploying learning algorithms by explicitly considering the effect of the classifier on the underlying data distribution. More recently, works in performative prediction seek to classify the closed-loop behavior by considering general properties of the mapping from classifier to data distribution, rather than an explicit form. Building on this notion, we analyze repeated risk minimization as the perturbed trajectories of the gradient flows of performative risk minimization. We consider the case where there may be multiple local minimizers of performative risk, motivated by situations where the initial conditions may have significant impact on the long-term behavior of the system. We provide sufficient conditions to characterize the region of attraction for the various equilibria in this settings. Additionally, we introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
翻译:随着数据驱动方法在实际场景中的部署,生成观测数据的流程通常会对学习者的决策做出反应。例如,数据源可能为了促使算法提供特定标签(如批准银行贷款)而具有某种激励动机,并相应调整其特征。策略分类与决策依赖分布领域的工作通过显式考虑分类器对底层数据分布的影响,旨在刻画学习算法部署的闭环行为。近期,表现性预测领域的工作通过考虑从分类器到数据分布的映射的一般性质(而非显式形式)来对闭环行为进行分类。基于这一概念,我们将重复风险最小化分析为表现性风险最小化梯度流的扰动轨迹。我们考虑表现性风险可能存在多个局部极小值的情况,这一设定源于初始条件可能对系统长期行为产生显著影响的现实场景。我们为这一设定下各种均衡点的吸引域提供了充分条件的刻画。此外,我们引入了表现性对齐的概念,它提供了重复风险最小化收敛于表现性风险最小化器的几何条件。