This paper investigates the theoretical foundation and develops analytical formulas for sample size and power calculations for causal inference with observational data. By analyzing the variance of an inverse probability weighting estimator of the average treatment effect, we decompose the power calculation into three components: propensity score distribution, potential outcome distribution, and their correlation. We show that to determine the minimal sample size of an observational study, in addition to the standard inputs in the power calculation of randomized trials, it is sufficient to have two parameters, which quantify the strength of the confounder-treatment and the confounder-outcome association, respectively. For the former, we propose using the Bhattacharyya coefficient, which measures the covariate overlap and, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. For the latter, we propose a sensitivity parameter bounded by the R-squared statistic of the regression of the outcome on covariates. Our procedure relies on a parametric propensity score model and a semiparametric restricted mean outcome model, but does not require distributional assumptions on the multivariate covariates. We develop an associated R package PSpower.
翻译:本文研究了基于观察性数据进行因果推断时样本量与功效计算的理论基础,并推导了相应的解析公式。通过分析平均处理效应的逆概率加权估计量的方差,我们将功效计算分解为三个组成部分:倾向得分分布、潜在结果分布及其相关性。研究表明,为确定观察性研究所需的最小样本量,除随机试验功效计算中的标准输入参数外,仅需两个分别量化混杂因素-处理关联强度和混杂因素-结果关联强度的参数即可。对于前者,我们提出使用Bhattacharyya系数来度量协变量重叠度,该系数与处理比例共同构成唯一可识别且易于计算的倾向得分分布。对于后者,我们提出以结果对协变量回归的R平方统计量作为边界约束的敏感性参数。本方法基于参数化倾向得分模型与半参数化限制平均结果模型,但无需对多元协变量进行分布假设。我们同步开发了相应的R软件包PSpower。