This paper studies algorithmic decision-making in the presence of strategic individual behaviors, where an ML model is used to make decisions about human agents and the latter can adapt their behavior strategically to improve their future data. Existing results on strategic learning have largely focused on the linear setting where agents with linear labeling functions best respond to a (noisy) linear decision policy. Instead, this work focuses on general non-linear settings where agents respond to the decision policy with only "local information" of the policy. Moreover, we simultaneously consider the objectives of maximizing decision-maker welfare (model prediction accuracy), social welfare (agent improvement caused by strategic behaviors), and agent welfare (the extent that ML underestimates the agents). We first generalize the agent best response model in previous works to the non-linear setting, then reveal the compatibility of welfare objectives. We show the three welfare can attain the optimum simultaneously only under restrictive conditions which are challenging to achieve in non-linear settings. The theoretical results imply that existing works solely maximizing the welfare of a subset of parties inevitably diminish the welfare of the others. We thus claim the necessity of balancing the welfare of each party in non-linear settings and propose an irreducible optimization algorithm suitable for general strategic learning. Experiments on synthetic and real data validate the proposed algorithm.
翻译:本文研究了存在策略性个体行为时的算法决策问题,其中机器学习模型被用于对人类主体做出决策,而后者能够策略性地调整自身行为以改善其未来数据。现有关于策略学习的研究主要集中于线性场景,即具有线性标签函数的智能体对(含噪声的)线性决策策略做出最优响应。相反,本文聚焦于一般非线性场景,其中智能体仅利用决策策略的“局部信息”进行响应。同时,我们综合考虑了决策者福利(模型预测精度)、社会福利(策略行为带来的智能体改善程度)以及智能体福利(机器学习对智能体的低估程度)这三类目标。我们首先将以往工作中的智能体最优响应模型推广至非线性场景,进而揭示了各类福利目标之间的兼容性。结果表明,仅当满足严格条件时,三类福利才能同时达到最优,而这些条件在非线性场景中难以实现。理论结论表明,现有仅最大化某一方福利的研究必然会损害其他方的福利。因此,我们论证了在非线性场景中平衡各方福利的必要性,并提出了一种适用于一般策略学习的不可约优化算法。在合成数据与真实数据上的实验验证了所提算法的有效性。