In domains where agents interact strategically, game theory is applied widely to predict how agents would behave. However, game-theoretic predictions are based on the assumption that agents are fully rational and believe in equilibrium plays, which unfortunately are mostly not true when human decision makers are involved. To address this limitation, a number of behavioral game-theoretic models are defined to account for the limited rationality of human decision makers. The "quantal cognitive hierarchy" (QCH) model, which is one of the more recent models, is demonstrated to be the state-of-art model for predicting human behaviors in normal-form games. The QCH model assumes that agents in games can be both non-strategic (level-0) and strategic (level-$k$). For level-0 agents, they choose their strategies irrespective of other agents. For level-$k$ agents, they assume that other agents would be behaving at levels less than $k$ and best respond against them. However, an important assumption of the QCH model is that the distribution of agents' levels follows a Poisson distribution. In this paper, we relax this assumption and design a learning-based method at the population level to iteratively estimate the empirical distribution of agents' reasoning levels. By using a real-world dataset from the Swedish lowest unique positive integer game, we demonstrate how our refined QCH model and the iterative solution-seeking process can be used in providing a more accurate behavioral model for agents. This leads to better performance in fitting the real data and allows us to track an agent's progress in learning to play strategically over multiple rounds.
翻译:在智能体战略互动的领域中,博弈论被广泛用于预测智能体的行为。然而,博弈论预测基于智能体完全理性且相信均衡行为的假设,但这一假设在涉及人类决策者时通常不成立。为解决这一局限,研究者定义了一系列行为博弈论模型,以考虑人类决策者的有限理性。最新模型之一的“量子认知层级”(QCH)模型,已被证明是预测正规形式博弈中人类行为的最先进模型。QCH模型假设博弈中的智能体可分为非策略型(第0层)和策略型(第k层)。第0层智能体不依赖其他智能体选择策略,而第k层智能体假设其他智能体处于低于k的层级,并据此做出最优反应。然而,QCH模型的一个关键假设是智能体层级的分布服从泊松分布。本文放宽了这一假设,设计了一种基于学习的种群级迭代方法,用于估计智能体推理层级的经验分布。通过瑞典最小唯一正整数博弈的真实数据集,我们展示了改进后的QCH模型及迭代求解过程如何为智能体提供更精确的行为模型。这不仅能更准确地拟合真实数据,还可追踪智能体在多轮博弈中逐步学习策略性行为的进程。