Real-world systems often involve some pool of users choosing between a set of services. With the increase in popularity of online learning algorithms, these services can now self-optimize, leveraging data collected on users to maximize some reward such as service quality. On the flipside, users may strategically choose which services to use in order to pursue their own reward functions, in the process wielding power over which services can see and use their data. Extensive prior research has been conducted on the effects of strategic users in single-service settings, with strategic behavior manifesting in the manipulation of observable features to achieve a desired classification; however, this can often be costly or unattainable for users and fails to capture the full behavior of multi-service dynamic systems. As such, we analyze a setting in which strategic users choose among several available services in order to pursue positive classifications, while services seek to minimize loss functions on their observations. We focus our analysis on realizable settings, and show that naive retraining can still lead to oscillation even if all users are observed at different times; however, if this retraining uses memory of past observations, convergent behavior can be guaranteed for certain loss function classes. We provide results obtained from synthetic and real-world data to empirically validate our theoretical findings.
翻译:现实系统通常涉及用户群体在一组服务之间进行选择。随着在线学习算法的普及,这些服务如今能够自我优化,利用收集到的用户数据最大化服务质量等奖励。另一方面,用户可能策略性地选择使用哪些服务以追求自身奖励函数,在此过程中掌握着哪些服务可以查看和使用其数据的权力。已有大量研究探讨了单服务环境中策略性用户的影响,其中策略行为表现为操纵可观察特征以实现期望的分类;然而,这对用户而言往往成本高昂或难以实现,且未能充分捕捉多服务动态系统的全部行为。因此,我们分析了一个场景:策略性用户在多个可用服务中选择以追求正面分类,而服务则力求最小化其观测数据的损失函数。我们聚焦于可实现的设定,并证明即便所有用户在不同时间被观测,朴素重训练仍可能导致震荡;然而,若重训练利用过往观测的记忆,则可保证某些损失函数类别的收敛行为。我们通过合成数据与真实数据的结果,为理论发现提供了实证验证。