Bayesian models are claimed to be fully robust against outliers if, asymptotically, observations infinitely far from the other data do not influence the posterior. Early works in robust Bayesian inference concentrated on continuous distributions and i.i.d. observations. Robustness results were then extended to linear regression in the presence of infinite residuals, either through an outlying outcome or an outlying covariate. Recently, Hamura et al. (2025, arXiv:2106.10503) presented a count regression model, with Poisson-Rescaled Beta (-RSB) target distribution and Gaussian latent variables (GLVs), which is robust against infinitely large counts and able to handle zero-inflation. We continue from the work of Hamura et al. and study the robustness properties of mixed Poisson regression models with GLVs in the presence of outlying data points arising from either corrupted covariates or corrupted target values. While in linear regression the two cases are interchangeable, as both infinite target or covariates lead to infinite residuals, we show that in count regression infinite covariates is not a symmetric case to infinite target. Specifically, we show that mixed Poisson models are not asymptotically robust to outliers resulting from infinite covariates. We then consider three alternative mixed Poissons (Poisson-Gamma, Poisson-log-t, and Poisson-RSB) as target distribution and examine, both theoretically and via simulations as well as real-world case studies, their behavior in the presence of outliers of three alternative types: large target value as well as large and small covariate values. Our results show that models robust to data points with an anomalous target are not robust to data points with anomalous covariates, calling for methodological development for models that are robust for covariate outliers.
翻译:贝叶斯模型若在渐近意义上,使其后验分布不受无限远离其他数据的观测值影响,则被宣称对离群值具有完全鲁棒性。早期稳健贝叶斯推断研究主要集中于连续分布与独立同分布观测值。随后,鲁棒性结果被推广至存在无限残差的线性回归场景——无论是因异常响应值还是异常协变量所致。近期,Hamura等人(2025, arXiv:2106.10503)提出了一个计数回归模型,该模型采用泊松-重标贝塔(RSB)目标分布与高斯潜变量(GLVs),既能抵抗无限大计数值的影响,又能处理零膨胀问题。我们延续Hamura等人的工作,研究在协变量或目标值被污染导致的异常数据点存在时,含GLV的混合泊松回归模型的鲁棒性特性。尽管在线性回归中,无限目标值与无限协变量均会导致无限残差,从而使得两种情形可互换,但我们证明在计数回归中,无限协变量与无限目标值并非对称情形。具体而言,我们证明混合泊松模型对因无限协变量产生的离群值不具有渐近鲁棒性。我们进而考虑三种替代混合泊松分布(泊松-伽马、泊松对数-t与泊松-RSB)作为目标分布,并通过理论分析、仿真实验及真实案例研究,检验其在三类异常数据点(大目标值、大协变量值及小协变量值)存在时的表现。结果表明,对异常目标值数据点具有鲁棒性的模型,对异常协变量数据点并不鲁棒,这呼唤研究者针对协变量离群值鲁棒模型开展方法论开发。