We consider logistic regression including two sets of discrete or categorical covariates that are missing at random (MAR) separately or simultaneously. We examine the asymptotic properties of two multiple imputation (MI) estimators, given in the study of Lee at al. (2023), for the parameters of the logistic regression model with both sets of discrete or categorical covariates that are MAR separately or simultaneously. The proposed estimated asymptotic variances of the two MI estimators address a limitation observed with Rubin's type estimated variances, which lead to underestimate the variances of the two MI estimators (Rubin, 1987). Simulation results demonstrate that our two proposed MI methods outperform the complete-case, semiparametric inverse probability weighting, random forest MI using chained equations, and stochastic approximation of expectation-maximization methods. To illustrate the methodology's practical application, we provide a real data example from a survey conducted in the Feng Chia night market in Taichung City, Taiwan.
翻译:我们考虑包含两组离散或分类协变量的逻辑回归模型,这些协变量分别或同时满足随机缺失(MAR)条件。针对Lee等人(2023)研究中给出的两种多重插补(MI)估计量,我们系统考察了在两组离散/分类协变量分别或同时满足MAR假设时,逻辑回归模型参数估计的渐近性质。本文提出的两种MI估计量渐近方差估计式,有效改进了Rubin型方差估计存在的缺陷——后者会导致对两类MI估计量方差的系统低估(Rubin, 1987)。模拟结果表明,我们提出的两种MI方法在性能上优于完整病例分析、半参数逆概率加权、基于链式方程的随机森林MI以及期望最大化随机逼近方法。为阐明该方法的应用价值,我们提供了台湾台中市逢甲夜市问卷调查的真实数据实例。