This work is motivated by the need to accurately model a vector of responses related to pediatric functional status using administrative health data from inpatient rehabilitation visits. The components of the responses have known and structured interrelationships. To make use of these relationships in modeling, we develop a two-pronged regularization approach to borrow information across the responses. The first component of our approach encourages joint selection of the effects of each variable across possibly overlapping groups related responses and the second component encourages shrinkage of effects towards each other for related responses. As the responses in our motivating study are not normally-distributed, our approach does not rely on an assumption of multivariate normality of the responses. We show that with an adaptive version of our penalty, our approach results in the same asymptotic distribution of estimates as if we had known in advance which variables were non-zero and which variables have the same effects across some outcomes. We demonstrate the performance of our method in extensive numerical studies and in an application in the prediction of functional status of pediatric patients using administrative health data in a population of children with neurological injury or illness at a large children's hospital.
翻译:本研究受实际需求驱动,旨在利用住院康复就诊的行政健康数据,准确建模与儿童功能状态相关的响应向量。该响应的各分量已知具有结构化内在关联。为在建模中利用这些关联,我们提出一种双路径正则化方法以跨响应共享信息。该方法的第一部分促进每个变量在可能重叠的关联响应组间实现联合效应选择,第二部分则促使关联响应对应效应向彼此收缩。由于本激励研究中的响应变量非正态分布,该方法不依赖多元正态性假设。我们证明,采用自适应形式的惩罚项后,该方法可达到与预知变量非零性及某些变量跨结果效应一致性时相同的渐近估计分布。通过大规模数值实验以及基于大型儿童医院神经损伤或疾病患儿行政健康数据的功能状态预测应用,我们验证了该方法的性能。