We propose a flexible Bayesian approach for estimating the joint density of a multivariate outcome of interest in the presence of categorical covariates. Leveraging a Gaussian copula framework, our method effectively captures the dependence structure across different coordinates of the multivariate response. The conditional (on covariates) marginal (across outcomes) distributions are modeled as flexible mixtures with shared atoms across coordinates, while the mixture weights are allowed to vary with covariates through a novel Tucker tensor factorization-based structure, which enables the identification of coordinate-specific subsets of influential covariates. In particular, we replace the traditional mode matrices with coordinate-specific random partition models on the covariate levels, offering a flexible mechanism to aggregate covariate levels that exhibit similar effects on the response. Additionally, to handle settings with many covariates, we introduce a Markov chain Monte Carlo algorithm that scales with the number of aggregated levels rather than the original levels, significantly reducing memory requirements and improving computational efficiency. We demonstrate the method's numerical performance through simulation experiments and its practical applicability through the analysis of NHANES dietary data.
翻译:我们提出了一种灵活的贝叶斯方法,用于在存在分类协变量的情况下估计多元结果的联合密度。该方法基于高斯连接函数框架,有效捕获了多元响应中各坐标之间的依赖结构。条件(给定协变量)边缘(跨结果)分布被建模为具有共享原子坐标间的灵活混合模型,同时通过基于新型Tucker张量分解的结构,使混合权重随协变量变化,从而能够识别具有影响力的坐标特定协变量子集。具体而言,我们用坐标特定的协变量层次随机划分模型替代传统模式矩阵,提供了一种灵活机制来聚合对响应具有相似影响的协变量层次。此外,为处理具有大量协变量的场景,我们引入了一种马尔可夫链蒙特卡洛算法,其计算复杂度取决于聚合后的层次数而非原始层次数,显著降低了内存需求并提升了计算效率。我们通过模拟实验验证了该方法的数值性能,并通过分析NHANES膳食数据展示了其实用性。