The adoption of continuous shrinkage priors in high-dimensional linear models has gained momentum, driven by their theoretical and practical advantages. One of these shrinkage priors is the R2D2 prior, which comes with intuitive hyperparameters and well understood theoretical properties. The core idea is to specify a prior on the percentage of explained variance $R^2$ and to conduct a Dirichlet decomposition to distribute the explained variance among all the regression terms of the model. Due to the properties of the Dirichlet distribution, the competition among variance components tends to gravitate towards negative dependence structures, fully determined by the individual components' means. Yet, in reality, specific coefficients or groups may compete differently for the total variability than the Dirichlet would allow for. In this work we address this limitation by proposing a generalization of the R2D2 prior, which we term the Generalized Decomposition R2 (GDR2) prior. Our new prior provides great flexibility in expressing dependency structures as well as enhanced shrinkage properties. Specifically, we explore the capabilities of variance decomposition via logistic normal distributions. Through extensive simulations and real-world case studies, we demonstrate that GDR2 priors yield strongly improved out-of-sample predictive performance and parameter recovery compared to R2D2 priors with similar hyper-parameter choices.
翻译:在高维线性模型中,连续收缩先验因其理论和实践优势而得到广泛采用。R2D2先验作为其中一种收缩先验,具有直观的超参数和良好的理论性质。其核心思想是对可解释方差百分比$R^2$设定先验,并通过狄利克雷分解将可解释方差分配给模型中所有回归项。由于狄利克雷分布的性质,方差分量之间的竞争倾向于呈现由各分量均值完全决定的负相关结构。然而在实际场景中,特定系数或系数组对总变异性的竞争方式可能不同于狄利克雷分布所能允许的模式。本研究针对这一局限性提出R2D2先验的推广形式,我们称之为广义分解R2(GDR2)先验。该新先验在表达依赖结构方面具有高度灵活性,并具有增强的收缩特性。具体而言,我们探索了通过逻辑正态分布实现方差分解的能力。通过大量仿真实验和实际案例研究,我们证明在相似超参数选择下,GDR2先验相比R2D2先验在样本外预测性能和参数恢复方面均有显著提升。