The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis.
翻译:离散结果回归模型的评估面临诸多根本性挑战。对于离散结果,皮尔逊残差和偏差残差等标准回归模型评估工具在真实模型条件下并不遵循常规参考分布(正态分布),这质疑了基于这些工具进行模型评估的合理性。为填补这一空白,我们针对包括有序结果和计数结果在内的广义离散结果构建了一种新型残差。所提出的残差基于两层概率积分变换。当至少存在一个连续协变量时,所提出的残差在正确设定的模型下近似服从均匀分布(经变换后服从正态分布)。研究者可直接构建QQ图等可视化工具来检验模型的整体拟合优度,并且QQ图的形态可进一步帮助识别可能存在的模型误设原因(如过度离散)。我们通过建立所提出残差的渐近性质为其提供了理论依据。此外,为评估均值结构并识别潜在协变量,我们开发了基于结果部分和与拟合均值部分和比较的序贯曲线作为辅助工具。通过模拟实验,我们实证证明了所提出工具在各类模型评估任务中优于常用残差。我们还通过数据分析展示了使用所提出工具进行模型评估的工作流程。