Disease prediction is one of the central problems in biostatistical research. Some biomarkers are not only helpful in diagnosing and screening diseases but also associated with the severity of the diseases. It should be helpful to construct a prediction model that can estimate severity at the diagnosis or screening stage from perspectives such as treatment prioritization. We focus on solving the combined tasks of screening and severity prediction, considering a combined response variable such as \{healthy, mild, intermediate, severe\}. This type of response variable is ordinal, but since the two tasks do not necessarily share the same statistical structure, the conventional cumulative logit model (CLM) may not be suitable. To handle the composite ordinal response, we propose the Multi-task Cumulative Logit Model (MtCLM) with structural sparse regularization. This model is sufficiently flexible that can fit the different structures of the two tasks and capture their shared structure of them. In addition, MtCLM is valid as a stochastic model in the entire predictor space, unlike another conventional and flexible model, the non-parallel cumulative logit model (NPCLM). We conduct simulation experiments and real data analysis to illustrate the prediction performance and interpretability.
翻译:疾病预测是生物统计研究的核心问题之一。部分生物标志物不仅有助于疾病的诊断与筛查,还与疾病严重程度相关。从治疗优先级等角度出发,构建能够在诊断或筛查阶段同时评估严重程度的预测模型具有重要价值。本研究聚焦于解决筛查与严重程度预测的联合任务,考虑诸如{健康、轻度、中度、重度}的复合响应变量。此类响应变量具有有序属性,但由于两项任务未必共享相同的统计结构,传统累积对数比模型可能不再适用。为处理复合有序响应,我们提出了具有结构稀疏正则化的多任务累积对数比模型(MtCLM)。该模型具备足够灵活性,既能拟合两项任务的不同结构,又能捕捉其共享结构。此外,与另一种传统灵活模型——非平行累积对数比模型相比,MtCLM在整个预测变量空间中均保持随机模型的有效性。我们通过仿真实验和实际数据分析来展示其预测性能与可解释性。