With the precipitous decline in response rates, researchers and pollsters have been left with highly non-representative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables, $X$ must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly-used calibration weights -- which make the weighted mean $X$ in the sample equal that of the population -- only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of $X$ are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix $\mathbf{X}$ with a kernel matrix, $\mathbf{K}$ encoding high-order information about $\mathbf{X}$. Weights are then found to make the weighted average row of $\mathbf{K}$ among sampled units approximately equal that of the target population. This produces good calibration on a wide range of smooth functions of $X$, without relying on the user to decide which $X$ or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 U.S. presidential election.
翻译:随着响应率的急剧下降,研究人员和民调专家不得不面对高度非代表性的样本,依赖构造的权重使这些样本对目标总体具有代表性。尽管从业者运用宝贵的专家知识选择需要调整的变量$X$,但他们很少为这些变量与响应过程或结果之间的特定函数形式进行辩护。不幸的是,常用的校准权重(使样本中加权均值$X$等于总体均值)仅在结果和响应过程中未被$X$的线性函数解释的部分相互独立时,才能确保正确调整。为缓解这种函数形式依赖性,我们描述了用于总体加权的核平衡方法(kpop)。该方法将设计矩阵$\mathbf{X}$替换为核矩阵$\mathbf{K}$,该矩阵编码了$\mathbf{X}$的高阶信息。随后寻找权重,使样本中加权平均行$\mathbf{K}$近似等于目标总体的对应值。这可在$X$的广泛光滑函数上实现良好的校准,而无需用户决定包含哪些$X$或其函数形式。我们描述了该方法,并通过应用于2016年美国总统选举的民调数据加以说明。