The role of discretization scales in causal inference with continuous-time treatment

There are well-established methods for identifying the causal effect of a time-varying treatment applied at discrete time points. However, in the real world, many treatments are continuous or have a finer time scale than the one used for measurement or analysis. While researchers have investigated the discrepancies between estimates under varying discretization scales using simulations and empirical data, it is still unclear how the choice of discretization scale affects causal inference. To address this gap, we present a framework to understand how discretization scales impact the properties of causal inferences about the effect of a time-varying treatment. We introduce the concept of "identification bias", which is the difference between the causal estimand for a continuous-time treatment and the purported estimand of a discretized version of the treatment. We show that this bias can persist even with an infinite number of longitudinal treatment-outcome trajectories. We specifically examine the identification problem in a class of linear stochastic continuous-time data-generating processes and demonstrate the identification bias of the g-formula in this context. Our findings indicate that discretization bias can significantly impact empirical analysis, especially when there are limited repeated measurements. Therefore, we recommend that researchers carefully consider the choice of discretization scale and perform sensitivity analysis to address this bias. We also propose a simple and heuristic quantitative measure for sensitivity concerning discretization and suggest that researchers report this measure along with point and interval estimates in their work. By doing so, researchers can better understand and address the potential impact of discretization bias on causal inference.

翻译：在离散时间点施加时变处理的因果效应识别已有成熟方法。然而，现实世界中许多处理是连续的，或具有比测量或分析所用时间尺度更精细的时间尺度。尽管研究人员已基于模拟和实证数据探讨了不同离散化尺度下估计值的差异，但离散化尺度的选择如何影响因果推断仍不清晰。为弥补这一空白，我们提出一个框架来理解离散化尺度如何影响时变处理效应的因果推断性质。我们引入"识别偏差"概念，即连续时间处理的因果估计量与其离散化版本声称的估计量之间的差异。我们证明即使拥有无限多的纵向处理-结果轨迹，该偏差仍可能持续存在。我们专门在一类线性随机连续时间数据生成过程中检验识别问题，并展示该背景下g公式的识别偏差。研究结果表明，离散化偏差可能显著影响实证分析，尤其是在重复测量次数有限的情况下。因此，我们建议研究人员审慎选择离散化尺度，并通过敏感性分析应对此类偏差。此外，我们提出一种简单启发式的离散化敏感性定量度量方法，建议研究人员在报告点估计和区间估计时一并报告该度量。通过这种方式，研究人员能够更好地理解和应对离散化偏差对因果推断的潜在影响。