Estimating causal effects from observational data is a central problem in many domains. A general approach is to balance covariates with weights such that the distribution of the data mimics randomization. We present generalized balancing weights, Neural Balancing Weights (NBW), to estimate the causal effects of an arbitrary mixture of discrete and continuous interventions. The weights were obtained through direct estimation of the density ratio between the source and balanced distributions by optimizing the variational representation of $f$-divergence. For this, we selected $\alpha$-divergence as it presents efficient optimization because it has an estimator whose sample complexity is independent of its ground truth value and unbiased mini-batch gradients; moreover, it is advantageous for the vanishing-gradient problem. In addition, we provide the following two methods for estimating the balancing weights: improving the generalization performance of the balancing weights and checking the balance of the distribution changed by the weights. Finally, we discuss the sample size requirements for the weights as a general problem of a curse of dimensionality when balancing multidimensional data. Our study provides a basic approach for estimating the balancing weights of multidimensional data using variational $f$-divergences.
翻译:从观测数据中估计因果效应是多个领域的核心问题。一种通用方法是通过权重平衡协变量,使数据分布模拟随机化。我们提出广义平衡权重——神经平衡权重(NBW),用于估计离散和连续干预任意组合的因果效应。该权重通过优化$f$-散度的变分表示直接估计源分布与平衡分布之间的密度比而获得。我们选择$\alpha$-散度,因其估计器的样本复杂度与真实值无关且具有无偏小批量梯度,从而实现高效优化;此外,该散度对梯度消失问题具有优势。同时,我们提供两种估计平衡权重的方法:提升平衡权重的泛化性能,以及检验经权重改变后的分布平衡性。最后,我们讨论多维数据平衡时权重对样本量的需求,作为维数灾的共性问题。本研究为使用变分$f$-散度估计多维数据平衡权重提供了基础方法。