This paper presents a loss-based generalized Bayesian methodology for high-dimensional robust regression with serially correlated errors and predictors. The proposed framework employs a novel scaled pseudo-Huber (SPH) loss function, which smooths the well-known Huber loss, achieving a balance between quadratic and absolute linear loss behaviors. This flexibility enables the framework to accommodate both thin-tailed and heavy-tailed data effectively. The generalized Bayesian approach constructs a working likelihood utilizing the SPH loss that facilitates efficient and stable estimation while providing rigorous estimation uncertainty quantification for all model parameters. Notably, this allows formal statistical inference without requiring ad hoc tuning parameter selection while adaptively addressing a wide range of tail behavior in the errors. By specifying appropriate prior distributions for the regression coefficients -- e.g., ridge priors for small or moderate-dimensional settings and spike-and-slab priors for high-dimensional settings -- the framework ensures principled inference. We establish rigorous theoretical guarantees for the accurate estimation of underlying model parameters and the correct selection of predictor variables under sparsity assumptions for a wide range of data generating setups. Extensive simulation studies demonstrate the superiority of our approach compared to traditional quadratic and absolute linear loss-based Bayesian regression methods, highlighting its flexibility and robustness in high-dimensional and challenging data contexts.
翻译:本文提出了一种基于损失的广义贝叶斯方法,用于处理具有序列相关误差和预测变量的高维稳健回归问题。所提出的框架采用了一种新颖的缩放伪Huber(SPH)损失函数,该函数对著名的Huber损失进行了平滑处理,在二次损失与绝对线性损失行为之间实现了平衡。这种灵活性使得该框架能够同时有效适应薄尾和厚尾数据。该广义贝叶斯方法利用SPH损失构建了一个工作似然函数,该函数不仅促进了高效且稳定的参数估计,同时为所有模型参数提供了严格的估计不确定性量化。值得注意的是,该方法允许进行正式的统计推断,无需依赖临时的调参选择,并能自适应地处理误差中广泛的尾部行为。通过对回归系数指定适当的先验分布——例如在低维或中维设置中使用岭先验,在高维设置中使用尖峰-厚板先验——该框架确保了基于原则的推断。我们为广泛的数据生成设定建立了严格的理论保证,包括在稀疏性假设下对底层模型参数的准确估计以及对预测变量的正确选择。大量的模拟研究表明,相较于传统的基于二次损失和绝对线性损失的贝叶斯回归方法,我们的方法具有优越性,凸显了其在高维及具有挑战性的数据环境中的灵活性与稳健性。