Machine learning systems have been widely used to make decisions about individuals who may best respond and behave strategically to receive favorable outcomes, e.g., they may genuinely improve the true labels or manipulate observable features directly to game the system without changing labels. Although both behaviors have been studied (often as two separate problems) in the literature, most works assume individuals can (i) perfectly foresee the outcomes of their behaviors when they best respond; (ii) change their features arbitrarily as long as it is affordable, and the costs they need to pay are deterministic functions of feature changes. In this paper, we consider a different setting and focus on imitative strategic behaviors with unforeseeable outcomes, i.e., individuals manipulate/improve by imitating the features of those with positive labels, but the induced feature changes are unforeseeable. We first propose a Stackelberg game to model the interplay between individuals and the decision-maker, under which we examine how the decision-maker's ability to anticipate individual behavior affects its objective function and the individual's best response. We show that the objective difference between the two can be decomposed into three interpretable terms, with each representing the decision-maker's preference for a certain behavior. By exploring the roles of each term, we further illustrate how a decision-maker with adjusted preferences can simultaneously disincentivize manipulation, incentivize improvement, and promote fairness.
翻译:机器学习系统已被广泛用于对可能做出最佳响应并采取策略性行为以获取有利结果的个体进行决策,例如,他们可能真正改进真实标签,或直接操纵可观测特征以欺骗系统而不改变标签。尽管文献中已对这两种行为(通常作为两个独立问题)进行了研究,但大多数工作假设个体能够:(i) 在做出最佳响应时完美预见其行为的结果;(ii) 在可承受范围内任意改变其特征,且需支付的成本是特征改变的确定性函数。在本文中,我们考虑一种不同的设置,聚焦于带有不可预见结果的模仿性策略行为,即个体通过模仿具有正面标签的个体的特征来进行操纵/改进,但由此引起的特征变化是不可预见的。我们首先提出一个斯塔克尔伯格博弈模型来刻画个体与决策者之间的相互作用,并在此模型下考察决策者预测个体行为的能力如何影响其目标函数和个体的最佳响应。我们证明,两者之间的目标差异可分解为三个可解释项,每一项代表决策者对特定行为的偏好。通过探索每一项的作用,我们进一步说明了具有调整后偏好的决策者如何能够同时抑制操纵、激励改进并促进公平。