Categorical responses arise naturally within various scientific disciplines. In many circumstances, there is no predetermined order for the response categories, and the response has to be modeled as nominal. In this study, we regard the order of response categories as part of the statistical model, and show that the true order, when it exists, can be selected using likelihood-based model selection criteria. For predictive purposes, a statistical model with a chosen order may outperform models based on nominal responses, even if a true order does not exist. For multinomial logistic models, widely used for categorical responses, we show the existence of theoretically equivalent orders that cannot be differentiated based on likelihood criteria, and determine the connections between their maximum likelihood estimators. We use simulation studies and a real-data analysis to confirm the need and benefits of choosing the most appropriate order for categorical responses.
翻译:分类响应变量在多个科学学科中自然出现。在许多情况下,响应类别没有预定的顺序,因此响应变量需作为名义变量进行建模。在本研究中,我们将响应类别的顺序视为统计模型的一部分,并证明当真实顺序存在时,可通过基于似然的模型选择准则进行选取。从预测角度来看,即使真实顺序不存在,采用选定顺序的统计模型也可能优于基于名义响应的模型。对于广泛应用于分类响应变量分析的多项逻辑回归模型,我们证明存在基于似然准则无法区分的理论等价顺序,并确定了其最大似然估计量之间的联系。通过模拟研究与真实数据分析,我们证实了为分类响应变量选择最优顺序的必要性及优势。