We develop a new method for multivariate scalar on multidimensional distribution regression. Traditional approaches typically analyze isolated univariate scalar outcomes or consider unidimensional distributional representations as predictors. However, these approaches are sub-optimal because: i) they fail to utilize the dependence between the distributional predictors: ii) neglect the correlation structure of the response. To overcome these limitations, we propose a multivariate distributional analysis framework that harnesses the power of multivariate density functions and multitask learning. We develop a computationally efficient semiparametric estimation method for modelling the effect of the latent joint density on multivariate response of interest. Additionally, we introduce a new conformal algorithm for quantifying the uncertainty of regression models with multivariate responses and distributional predictors, providing valuable insights into the conditional distribution of the response. We have validated the effectiveness of our proposed method through comprehensive numerical simulations, clearly demonstrating its superior performance compared to traditional methods. The application of the proposed method is demonstrated on tri-axial accelerometer data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014 for modelling the association between cognitive scores across various domains and distributional representation of physical activity among older adult population. Our results highlight the advantages of the proposed approach, emphasizing the significance of incorporating complete spatial information derived from the accelerometer device.
翻译:我们提出了一种新的多维标量对多维分布回归方法。传统方法通常分析孤立的单变量标量结果,或将单维分布表示作为预测变量。然而,这些方法存在局限性:i)未能利用分布预测变量之间的依赖关系;ii)忽略了响应的相关结构。为克服这些限制,我们提出了一个多元分布分析框架,利用多元密度函数和多任务学习的能力。我们开发了一种计算高效的半参数估计方法,用于建模潜在联合密度对感兴趣多元响应的影响。此外,我们引入了一种新的共形算法,用于量化具有多元响应和分布预测变量的回归模型的不确定性,为响应的条件分布提供有价值的见解。通过全面的数值模拟,我们验证了所提出方法的有效性,并清晰展示了其相比传统方法的优越性能。该方法的应用通过2011-2014年美国国家健康与营养调查(NHANES)的三轴加速度计数据得以展示,用于建模老年人群中不同领域的认知评分与体力活动分布表示之间的关联。我们的结果凸显了所提出方法的优势,强调了整合来自加速度计设备的完整空间信息的重要性。