Algorithmic agents are used in a variety of competitive decision-making settings, including pricing contexts that range from online retail to residential home rental. We study the emergence of algorithmic collusion when competing agents employ multi-armed bandit algorithms and competition is modeled as a repeated Prisoner's Dilemma game. Notably, agents in our setting perform online learning with no prior model of game structure and have no direct knowledge of competitor states or actions, thus they cannot learn strategies that depend on these factors. These context-free bandits nonetheless frequently learn seemingly collusive behavior, a phenomenon we term naive collusion. Our results reveal that whether naive collusion emerges depends starkly on the choice of behavior policy employed by bandit learners. The mechanism underpinning the emergence of collusive outcomes is synchronicity in agent action plays, where synchronicity captures how often agents play the same action. We show that in the long-run, naive algorithmic collusion never emerges when both agents use a broad class of persistently random algorithms, including the epsilon-greedy algorithm without epsilon decay, sometimes emerges when both agents use greedy-in-the-limit algorithms which feature randomness during exploration but are asymptotically deterministic, and always emerges when both agents use deterministic bandit learning algorithms like those in the well-known upper confidence bound (UCB) family. We highlight market and algorithmic conditions under which one can and cannot predict a priori whether collusion will occur. Our findings have several policy implications: preventing pricing algorithms from conditioning their actions on competitor prices may not preclude algorithmic collusion, symmetry in algorithms may increase collusion potential, and the emergence of algorithmic collusion is path dependent.
翻译:算法智能体被广泛应用于各种竞争性决策场景,包括从在线零售到住宅租赁等定价环境。本研究探讨了当竞争智能体采用多臂老虎机算法且竞争被建模为重复囚徒困境博弈时,算法合谋的出现机制。值得注意的是,本设定中的智能体在无博弈结构先验模型的情况下进行在线学习,且无法直接获取竞争对手状态或行动信息,因此无法学习依赖这些因素的策略。然而,这些无上下文老虎机仍频繁习得表面上的合谋行为,我们将此现象称为朴素合谋。研究结果表明,朴素合谋是否出现显著取决于老虎机学习器所采用的行为策略选择。合谋结果产生的内在机制是智能体行动执行的同步性,其中同步性表征了智能体采取相同行动的频率。我们证明:从长期来看,当双方智能体均使用广泛类别的持续随机算法(包括无ε衰减的ε-greedy算法)时,朴素算法合谋永远不会出现;当双方均使用极限贪婪算法(在探索阶段具有随机性但渐近确定性)时,合谋有时会出现;当双方均使用确定性老虎机学习算法(如著名的上置信界(UCB)系列算法)时,合谋总是会出现。我们重点阐明了能够或无法先验预测合谋是否发生的市场条件与算法条件。本研究的发现具有多重政策启示:禁止定价算法根据竞争对手价格调整自身行动可能无法阻止算法合谋;算法的对称性可能增加合谋风险;算法合谋的出现具有路径依赖性。