Synthetic Priors - 专知论文

Bayesian inference in generalized linear models requires a prior on the coefficient vector $β$. Practitioners naturally reason about response probabilities at specific covariate values, not about abstract log-odds parameters. We develop synthetic priors: informative Bayesian priors for GLMs grounded in Good's device of imaginary observations -- the principle that every conjugate prior is equivalent to a likelihood on pseudo-data from the same exponential family. The conditional means prior of Bedrick (1996) elicits independent Beta priors on the conditional mean response at $p$ expert-chosen design points; the induced prior on $β$ is a product of binomial likelihoods at synthetic data points. Combined with Pólya-Gamma data augmentation \citep{polson2013}, the posterior admits an exact conjugate Gibbs sampler -- no tuning, no Metropolis step -- by treating the augmented dataset as a standard logistic regression. We show that ridge regression and catalytic priors \citep{huang2020} are instances of Good's device, and identify prediction-powered inference \citep{angelopoulos2023ppi} as a structural analogue in the frequentist setting -- all three mediate a variance-bias tradeoff through a single informativeness parameter. We illustrate the approach on two benchmark problems: the Challenger O-ring data \citep{dalal1989}, where the BCJ prior provides a more moderate posterior predictive at the 31°F launch temperature; and a Phase~II atopic dermatitis dose-finding trial ($n = 300$), where the synthetic prior narrows 95\% credible intervals by 3-6\% and raises decision probabilities by up to 2 percentage points relative to a flat prior.

翻译：广义线性模型中的贝叶斯推断需要对系数向量$β$设定先验分布。实践者通常基于特定协变量值下的响应概率进行推理，而非抽象的log-odds参数。本文提出合成先验：一种基于Good虚拟观测原理构建的广义线性模型信息性贝叶斯先验——该原理指出每个共轭先验都等价于来自同一指数族的伪数据似然。Bedrick（1996）的条件均值先验通过在$p$个专家选定的设计点处设定独立的Beta先验来获取条件均值响应；由此诱导出的$β$先验表现为合成数据点上二项似然的乘积。结合Pólya-Gamma数据增强技术\citep{polson2013}，通过将增强数据集视为标准逻辑回归，后验分布可构建精确的共轭Gibbs采样器——无需调参且无需Metropolis步骤。我们证明岭回归与催化先验\citep{huang2020}均为Good原理的特例，并将预测驱动推断\citep{angelopoulos2023ppi}识别为频率主义框架中的结构类比——三者皆通过单一信息度参数调节方差-偏差权衡。我们在两个基准问题上展示该方法：挑战者号O形环数据\citep{dalal1989}中，BCJ先验在31°F发射温度下提供了更温和的后验预测；以及一项II期特应性皮炎剂量探索试验（$n = 300$）中，相较于平坦先验，合成先验将95%可信区间收窄3-6%，并将决策概率提升最多2个百分点。