In this work logistic regression when both the response and the predictor variables may be missing is considered. Several existing approaches are reviewed, including complete case analysis, inverse probability weighting, multiple imputation and maximum likelihood. The methods are compared in a simulation study, which serves to evaluate the bias, the variance and the mean squared error of the estimators for the regression coefficients. In the simulations, the maximum likelihood methodology is the one that presents the best results, followed by multiple imputation with five imputations, which is the second best. The methods are applied to a case study on the obesity for schoolchildren in the municipality of Viana do Castelo, North Portugal, where a logistic regression model is used to predict the International Obesity Task Force (IOTF) indicator from physical examinations and the past values of the obesity status. All the variables in the case study are potentially missing, with gender as the only exception. The results provided by the several methods are in well agreement, indicating the relevance of the past values of IOTF and physical scores for the prediction of obesity. Practical recommendations are given.
翻译:本文考虑了响应变量和预测变量均可能缺失情况下的逻辑回归问题。综述了多种现有方法,包括完整案例分析、逆概率加权、多重插补和极大似然估计。通过模拟研究对这些方法进行了比较,评估了回归系数估计量的偏差、方差和均方误差。模拟结果显示,极大似然法表现最优,其次是采用五次插补的多重插补法。这些方法被应用于葡萄牙北部维亚纳杜卡斯特洛市学龄儿童肥胖问题的案例研究,其中利用逻辑回归模型通过体格检查及既往肥胖状态指标来预测国际肥胖工作组(IOTF)指标。该案例研究中的所有变量(性别除外)均可能存在缺失。各方法得到的结果高度一致,表明既往IOTF指标及体能评分对肥胖预测具有重要价值。最后给出了实用建议。