Moment restrictions and their conditional counterparts emerge in many areas of machine learning and statistics ranging from causal inference to reinforcement learning. Estimators for these tasks, generally called methods of moments, include the prominent generalized method of moments (GMM) which has recently gained attention in causal inference. GMM is a special case of the broader family of empirical likelihood estimators which are based on approximating a population distribution by means of minimizing a $\varphi$-divergence to an empirical distribution. However, the use of $\varphi$-divergences effectively limits the candidate distributions to reweightings of the data samples. We lift this long-standing limitation and provide a method of moments that goes beyond data reweighting. This is achieved by defining an empirical likelihood estimator based on maximum mean discrepancy which we term the kernel method of moments (KMM). We provide a variant of our estimator for conditional moment restrictions and show that it is asymptotically first-order optimal for such problems. Finally, we show that our method achieves competitive performance on several conditional moment restriction tasks.
翻译:矩限制及其条件形式出现在机器学习和统计学的许多领域,从因果推断到强化学习。这类任务的估计量通常称为矩方法,包括近年来在因果推断中备受关注的广义矩方法(GMM)。GMM是更广泛的经验似然估计量族的一个特例,该族通过最小化$\varphi$-散度来逼近经验分布,从而近似总体分布。然而,使用$\varphi$-散度实际上将候选分布限制为数据样本的重新加权。我们突破了这一长期存在的局限,提出了一种超越数据重加权的矩方法。这是通过定义基于最大均值差异的经验似然估计量实现的,我们将其称为核矩方法(KMM)。我们针对条件矩限制提供了该估计量的一个变体,并证明它对于此类问题具有渐近一阶最优性。最后,我们展示了该方法在多个条件矩限制任务上达到了具有竞争力的性能。