We introduce the Coarse Payoff-Assessment Learning (CPAL) model, which captures reinforcement learning by boundedly rational decision-makers who focus on the aggregate outcomes of choosing among exogenously defined clusters of alternatives (similarity classes), rather than evaluating each alternative individually. Analyzing a smooth approximation of the model, we show that the learning dynamics exhibit steady-states corresponding to smooth Valuation Equilibria (Jehiel and Samet, 2007). We demonstrate the existence of multiple equilibria in decision trees with generic payoffs and establish the local asymptotic stability of pure equilibria when they occur. Conversely, when trivial choices featuring alternatives within the same similarity class yield sufficiently high payoffs, a unique mixed equilibrium emerges, characterized by indifferences between similarity classes, even under acute sensitivity to payoff differences. Finally, we prove that this unique mixed equilibrium is globally asymptotically stable under the CPAL dynamics.
翻译:本文提出粗粒度收益评估学习(CPAL)模型,该模型刻画了有限理性决策者的强化学习行为:他们关注于从外生定义的备选方案簇(相似类)中进行选择所带来的聚合结果,而非对每个备选方案进行单独评估。通过对该模型的平滑近似进行分析,我们证明了学习动态具有与平滑估值均衡(Jehiel and Samet, 2007)相对应的稳态。我们在具有一般性收益的决策树中证明了多重均衡的存在性,并确立了纯策略均衡出现时的局部渐近稳定性。反之,当同一相似类内备选方案构成的平凡选择能产生足够高的收益时,则会出现唯一的混合策略均衡,其特征表现为对相似类之间的漠视,即使在收益差异高度敏感的情况下亦是如此。最后,我们证明在CPAL动态下,这一唯一的混合策略均衡是全局渐近稳定的。