Spatial autocorrelation in regression models can lead to downward biased standard errors and thus incorrect inference. The most common correction in applied economics is the spatial heteroskedasticity and autocorrelation consistent (HAC) standard error estimator introduced by Conley (1999). A critical input is the kernel bandwidth: the distance within which residuals are allowed to be correlated. However, this is still an unresolved problem and there is no formal guidance in the literature. In this paper, I first document that the relationship between the bandwidth and the magnitude of spatial HAC standard errors is inverse-U shaped. This implies that both too narrow and too wide bandwidths lead to underestimated standard errors, contradicting the conventional wisdom that wider bandwidths yield more conservative inference. I then propose a simple, non-parametric, data-driven bandwidth selector based on the empirical covariogram of regression residuals. In extensive Monte Carlo experiments calibrated to empirically relevant spatial correlation structures across the contiguous United States, I show that the proposed method controls the false positive rate at or near the nominal 5% level across a wide range of spatial correlation intensities and sample configurations. I compare six kernel functions and find that the Bartlett and Epanechnikov kernels deliver the best size control. An empirical application using U.S. county-level data illustrates the practical relevance of the method. The R package SpatialInference implements the proposed bandwidth selection method.
翻译:回归模型中的空间自相关可能导致标准误差向下偏误,进而引发错误推断。应用经济学中最常见的修正方法是Conley(1999)提出的空间异方差自相关一致(HAC)标准误差估计量。其关键输入参数是核带宽:即允许残差相关的距离范围。然而,这仍是一个悬而未决的问题,现有文献缺乏正式的选择指导。本文首先证明带宽与空间HAC标准误差大小之间存在倒U型关系。这意味着过窄或过宽的带宽都会导致标准误差被低估,这与“更宽的带宽会产生更保守推断”的传统认知相悖。随后,本文基于回归残差的经验协变图,提出一种简单、非参数、数据驱动的带宽选择方法。通过针对美国本土实证相关空间相关结构进行校准的大规模蒙特卡洛实验,研究表明:在广泛的空间相关强度和样本配置下,所提方法能将误报率控制在名义5%水平或接近该水平。本文比较了六种核函数,发现Bartlett核与Epanechnikov核能实现最佳的尺寸控制。基于美国县级数据的实证应用验证了该方法的实际价值。R软件包SpatialInference实现了本文提出的带宽选择方法。