We consider multi-environment prediction problems. We assume the environments change the distribution of a latent variable, while the mechanisms generating observed covariates and targets remain stable conditional on that variable. For example, hospitals or clinical cohorts may differ in the prevalence of latent patient states, even though the relationships between those states, physiological measurements, and outcomes remain unchanged. Given a dataset from multiple environments, we formulate a Bayesian model for such problems and derive the corresponding variational objective. We show that this objective decomposes into per-environment terms and an additional cross-environment balancing term induced by the model's structure. We use an empirical Bayes method to set the prior and incorporate it into the objective. Based on this objective, we develop an amortized variational algorithm for posterior approximation, and use the resulting learned latent variables to form predictions in new environments.We study our approach through simulations and real-world studies of astronomical source identification, microbiome-based disease detection, and ICU sepsis prediction. Across these settings, our method outperforms previous approaches for prediction in new environments.
翻译:我们考虑多环境预测问题。假设环境会改变潜在变量的分布,而观测协变量与目标变量的生成机制在该变量条件作用下保持稳定。例如,不同医院或临床队列中潜在患者状态的发生率可能存在差异,但状态与生理测量值及结果之间的关联保持不变。基于来自多个环境的数据集,我们为这类问题构建贝叶斯模型并推导相应的变分目标函数。研究表明,该目标函数可分解为各环境项与由模型结构诱导的跨环境平衡项。我们采用经验贝叶斯方法设定先验分布并将其融入目标函数。基于该目标函数,我们开发了用于后验近似的摊销变分算法,并利用学习到的潜在变量在新环境中进行预测。我们通过天文光源识别、微生物组疾病检测及ICU脓毒症预测等仿真与真实案例验证了该方法。在这些场景中,我们的方法在新环境预测任务上均优于既有方法。