Polytomous categorical data are frequent in studies, that can be obtained with an individual or grouped structure. In both structures, the generalized logit model is commonly used to relate the covariates on the response variable. After fitting a model, one of the challenges is the definition of an appropriate residual and choosing diagnostic techniques. Since the polytomous variable is multivariate, raw, Pearson, or deviance residuals are vectors and their asymptotic distribution is generally unknown, which leads to difficulties in graphical visualization and interpretation. Therefore, the definition of appropriate residuals and the choice of the correct analysis in diagnostic tools is important, especially for nominal data, where a restriction of methods is observed. This paper proposes the use of randomized quantile residuals associated with individual and grouped nominal data, as well as Euclidean and Mahalanobis distance measures, as an alternative to reduce the dimension of the residuals. We developed simulation studies with both data structures associated. The half-normal plots with simulation envelopes were used to assess model performance. These studies demonstrated a good performance of the quantile residuals, and the distance measurements allowed a better interpretation of the graphical techniques. We illustrate the proposed procedures with two applications to real data.
翻译:多元分类数据在研究中常见,可呈现个体或分组结构。在这两种结构下,广义logit模型通常用于关联协变量与响应变量。模型拟合后,挑战之一在于定义适当的残差并选择诊断技术。由于多元变量是多维的,原始残差、皮尔逊残差或偏差残差均为向量,其渐近分布通常未知,导致图形可视化和解释困难。因此,定义适当的残差并选择诊断工具中的正确分析方法尤为重要,尤其是对于名义数据,此类数据存在方法局限性。本文提出使用与个体和分组名义数据相关的随机分位数残差,以及欧几里得距离和马氏距离测度,作为降低残差维度的替代方案。我们针对两种数据结构开展了模拟研究。采用带模拟包络的半正态图评估模型性能。这些研究表明分位数残差性能良好,而距离测度有助于更好地解释图形技术。我们通过两个真实数据应用案例展示了所提出的流程。