We develop methods for estimating how infinitesimal policy changes affect long-term outcomes in dynamic systems. We show that dynamic marginal policy effects (MPEs) can be identified via tractable reduced-form expressions, and can be estimated under a general sequential unconfoundedness assumption. We also propose a doubly robust estimator for dynamic MPEs. Our approach does not require observing full dynamic state information (as is typically assumed for off-policy evaluation in Markov decision processes), and does not incur an exponential curse of horizon (as is typical in non-Markovian off-policy evaluation). We demonstrate practicality and robustness of our approach in a number of simulations, including one motivated by a dynamic pricing application where people use past prices to form a reference level for current prices.
翻译:我们开发了估计无穷小政策变化如何影响动态系统中长期结果的方法。我们证明,动态边际政策效应(MPEs)可通过易于处理的简约形式表达式进行识别,并可在一般的顺序无混淆假设下进行估计。我们还提出了一种针对动态MPEs的双稳健估计量。我们的方法无需观测完整的动态状态信息(如马尔可夫决策过程中离策略评估通常假设的那样),也不会遭遇指数级时间跨度诅咒(如非马尔可夫离策略评估中常见的情况)。我们通过多组模拟实验展示了方法的实用性与稳健性,其中一组模拟基于动态定价应用场景——在该场景中,个体利用历史价格形成对当前价格的参考水平。