A primer on computational statistics for ordinal models with applications to survey data

The analysis of survey data is a frequently arising issue in clinical trials, particularly when capturing quantities which are difficult to measure using, e.g., a technical device or a biochemical procedure. Typical examples are questionnaires about patient's well-being, pain, anxiety, quality of life or consent to an intervention. Data is captured on a discrete scale containing only a limited (usually three to ten) number of possible answers, of which the respondent has to pick the answer which fits best his personal opinion to the question. This data is generally located on an ordinal scale as answers can usually be arranged in an increasing order, e.g., "bad", "neutral", "good" for well-being or "none", "mild", "moderate", "severe" for pain. Since responses are often stored numerically for data processing purposes, analysis of survey data using ordinary linear regression (OLR) models seems to be natural. However, OLR assumptions are often not met as linear regression requires a constant variability of the response variable and can yield predictions out of the range of response categories. Moreover, in doing so, one only gains insights about the mean response which might, depending on the response distribution, not be very representative. In contrast, ordinal regression models are able to provide probability estimates for all response categories and thus yield information about the full response scale rather than just the mean. Although these methods are well described in the literature, they seem to be rarely applied to biomedical or survey data. In this paper, we give a concise overview about fundamentals of ordinal models, applications to a real data set, outline usage of state-of-the-art-software to do so and point out strengths, limitations and typical pitfalls. This article is a companion work to a current vignette-based structured interview study in paediatric anaesthesia.

翻译：调查数据的分析在临床试验中是一个常见问题，尤其是在捕捉难以通过技术设备或生化程序测量的指标时。典型例子包括关于患者幸福感、疼痛、焦虑、生活质量或干预同意的问卷调查。数据在包含有限数量（通常三到十个）可能答案的离散尺度上收集，受访者需从中选择最符合个人意见的答案。这些数据通常位于序数量表上，因为答案可按递增顺序排列，例如幸福感中的“差”、“中性”、“好”，或疼痛中的“无”、“轻度”、“中度”、“重度”。由于为便于数据处理，答案常以数字形式存储，因此使用普通线性回归模型分析调查数据似乎是自然的选择。然而，普通线性回归的假设通常无法满足，因为线性回归要求响应变量具有恒定的变异性，且可能产生超出响应类别范围的预测。此外，这样做仅能获得关于平均响应的洞察，而根据响应分布的不同，这一平均可能并不具有很好的代表性。相比之下，序数回归模型能够为所有响应类别提供概率估计，从而提供关于整个响应尺度的信息，而不仅仅是平均值。尽管这些方法在文献中已有详细描述，但它们在实际中似乎很少应用于生物医学或调查数据。本文简明概述了序数模型的基本原理、在真实数据集上的应用，阐述了使用最新软件的方法，并指出了其优势、局限性和常见陷阱。本文是当前一项基于小插图的儿科麻醉结构化访谈研究的配套工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日