We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
翻译:我们研究了动态离散选择模型,其中一个常见问题是通过代理行为数据估计代理奖励函数(也称为“结构性”参数)的参数。此类模型的最大似然估计需要动态规划,但其计算受限于维度灾难。本文提出了一种新颖算法,提供了一种数据驱动的状态选择与聚合方法,从而降低了估计的计算复杂度和样本复杂度。该方法分为两个阶段:第一阶段,我们采用灵活的逆向强化学习方法估计代理的Q函数,并利用这些估计的Q函数结合聚类算法,选择对驱动Q函数变化最为关键的状态子集;第二阶段,基于这些选定的“聚合”状态,使用常用的嵌套定点算法进行最大似然估计。这种两阶段方法通过降低问题维度缓解了维度灾难。理论上,我们推导了相关估计误差的有限样本界,并刻画了计算复杂度、估计误差与样本复杂度之间的权衡关系。最后,我们在两个经典的动态离散选择估计应用中展示了该算法的实证性能。