In causal inference, properly selecting the propensity score (PS) model is an important topic and has been widely investigated in observational studies. There is also a large literature focusing on the missing data problem. However, there are very few studies investigating the model selection issue for causal inference when the exposure is missing at random (MAR). In this paper, we discuss how to select both imputation and PS models, which can result in the smallest root mean squared error (RMSE) of the estimated causal effect in our simulation study. Then, we propose a new criterion, called ``rank score'' for evaluating the overall performance of both models. The simulation studies show that the full imputation plus the outcome-related PS models lead to the smallest RMSE and the rank score can help select the best models. An application study is conducted to quantify the causal effect of cardiovascular disease (CVD) on the mortality of COVID-19 patients.
翻译:在因果推断中,恰当选择倾向得分模型是一个重要课题,已在观察性研究中得到广泛探讨。同时,也有大量文献关注缺失数据问题。然而,当暴露数据随机缺失时,针对因果推断的模型选择问题研究甚少。本文讨论了如何选择插补模型与倾向得分模型,以在我们的模拟研究中获得估计因果效应的最小均方根误差。随后,我们提出一种称为“秩得分”的新准则,用于评估两种模型的综合性能。模拟研究表明,完全插补结合结局相关的倾向得分模型能产生最小的均方根误差,且秩得分有助于选择最佳模型。我们通过一项应用研究量化了心血管疾病对COVID-19患者死亡率的因果效应。