We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
翻译:我们研究动态离散选择模型,其中一类常见问题涉及利用智能体行为数据估计其奖励函数的参数(也称为“结构”参数)。此类模型的最大似然估计需要动态规划,但受维数灾难所限。本文提出一种新颖算法,通过数据驱动方法选择并聚合状态,从而降低估计的计算复杂度与样本复杂度。方法包含两个阶段:第一阶段,采用灵活的逆强化学习方法估计智能体的Q函数,并借助聚类算法从这些估计的Q函数中选出对Q函数变化最关键的若干状态;第二阶段,基于选出的“聚合”状态,利用常用的嵌套不动点算法执行最大似然估计。这一两阶段方法通过降低问题维度缓解了维数灾难。理论上,我们推导了相关估计误差的有限样本界,该界同时刻画了计算复杂度、估计误差与样本复杂度的权衡关系。我们通过两个经典动态离散选择估计应用实例展示了算法的实证性能。