The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (or a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis.
翻译:离散结果回归模型的评估具有挑战性,并存在诸多根本性问题。对于离散结果,皮尔逊残差和偏差残差等标准回归模型评估工具在真实模型下并不遵循常规参考分布(正态分布),这使得基于这些工具进行模型评估的合理性受到质疑。为弥补这一不足,我们针对一般离散结果(包括有序结果和计数结果)构建了一种新型残差。所提出的残差基于两层概率积分变换。当至少存在一个连续协变量时,在正确设定模型下,所提出的残差近似服从均匀分布(或经变换后服从正态分布)。研究人员可直接构建QQ图等可视化工具来检查模型的整体拟合度,且QQ图的形状可进一步帮助识别误设定的可能原因(如过度离散)。我们通过建立所提出残差的渐近性质为其提供了理论依据。此外,为评估均值结构并识别潜在协变量,我们开发了基于结果部分和与拟合均值部分和比较的有序曲线作为辅助工具。通过仿真实验,我们实证证明所提出的工具在各种模型评估任务中优于常用残差。我们还通过数据分析展示了使用所提出工具进行模型评估的工作流程。