We analyze the effect that online algorithms have on the environment that they are learning. As a motivation, consider recommendation systems that use online algorithms to learn optimal product recommendations based on user and product attributes. It is well known that the sequence of recommendations affects user preferences. However, typical learning algorithms treat the user attributes as static and disregard the impact of their recommendations on user preferences. Our interest is to analyze the effect of this mismatch between the model assumption of a static environment, and the reality of an evolving environment affected by the recommendations. To perform this analysis, we first introduce a model for a generic coupled evolution of the parameters that are being learned, and the environment that is affected by it. We then frame a linear bandit recommendation system (RS) into this generic model where the users are characterized by a state variable that evolves based on the sequence of recommendations. The learning algorithm of the RS does not explicitly account for this evolution and assumes that the users are static. A dynamical system model that captures the coupled evolution of the population state and the learning algorithm is described, and its equilibrium behavior is analyzed. We show that when the recommendation algorithm is able to learn the population preferences in the presence of this mismatch, the algorithm induces similarity in the preferences of the user population. In particular, we present results on how different properties of the recommendation algorithm, namely the user attribute space and the exploration-exploitation tradeoff, effect the population preferences when they are learned by the algorithm. We demonstrate these results using model simulations.
翻译:我们分析了在线算法对其学习环境产生的影响。作为动机示例,考虑使用在线算法基于用户和产品属性来学习最优产品推荐的推荐系统。众所周知,推荐序列会影响用户偏好。然而,典型的学习算法将用户属性视为静态,忽视了其推荐对用户偏好的影响。我们的研究旨在分析这种模型假设(静态环境)与现实(受推荐影响的演化环境)之间不匹配所产生的效应。为进行此项分析,我们首先建立了一个通用耦合演化模型,描述被学习参数与受其影响的环境之间的相互作用。随后,我们将线性赌博机推荐系统(RS)嵌入该通用模型,其中用户通过状态变量表征,该变量根据推荐序列演化。推荐系统的学习算法未显式考虑这种演化,并假设用户是静态的。我们描述了捕捉群体状态与学习算法耦合演化的动力学系统模型,并分析了其均衡行为。研究表明,当推荐算法在这种不匹配条件下仍能学习群体偏好时,算法会诱导用户群体偏好趋同。具体而言,我们展示了推荐算法的不同特性(即用户属性空间和探索-利用权衡)在被算法学习时如何影响群体偏好。我们通过模型仿真验证了这些结果。