Inference at the data's edge: Gaussian processes for modeling and inference under model-dependency, poor overlap, and extrapolation

The Gaussian Process (GP) is a highly flexible non-linear regression approach that provides a principled approach to handling our uncertainty over predicted (counterfactual) values. It does so by computing a posterior distribution over predicted point as a function of a chosen model space and the observed data, in contrast to conventional approaches that effectively compute uncertainty estimates conditionally on placing full faith in a fitted model. This is especially valuable under conditions of extrapolation or weak overlap, where model dependency poses a severe threat. We first offer an accessible explanation of GPs, and provide an implementation suitable to social science inference problems. In doing so we reduce the number of user-chosen hyperparameters from three to zero. We then illustrate the settings in which GPs can be most valuable: those where conventional approaches have poor properties due to model-dependency/extrapolation in data-sparse regions. Specifically, we apply it to (i) comparisons in which treated and control groups have poor covariate overlap; (ii) interrupted time-series designs, where models are fitted prior to an event by extrapolated after it; and (iii) regression discontinuity, which depends on model estimates taken at or just beyond the edge of their supporting data.

翻译：高斯过程（GP）是一种高度灵活的非线性回归方法，它为处理预测（反事实）值的不确定性提供了原则性途径。该方法通过计算预测点的后验分布来实现这一目标，该分布是所选模型空间与观测数据的函数；而传统方法本质上是在完全信任拟合模型的前提下计算条件性不确定性估计。在外推或弱重叠条件下，模型依赖性构成严重威胁，此时GP方法显得尤为宝贵。我们首先对GP进行了易于理解的阐释，并提供了适用于社会科学推断问题的实现方案。在此过程中，我们将用户需选择的超参数数量从三个减少至零。随后我们阐述了GP最能体现价值的场景：即传统方法因数据稀疏区域的模型依赖性/外推问题而表现不佳的情况。具体而言，我们将其应用于以下场景：（i）处理组与对照组协变量重叠性较差的比较研究；（ii）中断时间序列设计——模型在事件发生前拟合，而后进行外推预测；（iii）回归断点设计——该设计依赖于在数据支撑边界处或略微超出边界处获取的模型估计值。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日