The linear instrumental variable (IV) model is widely used in observational studies, yet its validity hinges on strong assumptions. Classical specification tests such as the Sargan-Hansen J test are limited to overidentified settings and are therefore not applicable in the common just-identified case, where the number of instruments is equal to the number of endogenous variables. We propose a novel test for the well-specification of the linear IV model under the assumption that the structural error is mean independent of the instruments. This assumption enables specification testing even in the just-identified setting. Our approach uses the idea of residual prediction: if the two-stage least squares residuals can be predicted from the instruments better than chance, this indicates misspecification. The resulting test employs sample splitting and a user-chosen machine learning method, and we show asymptotic type I error control and consistency against a broad class of alternatives. We further show how the proposed testing principle can be adapted to settings with weak or many instruments via an Anderson-Rubin-type inversion, thereby substantially extending the applicability. The tests accommodate heteroskedasticity- and cluster-robust inference and are implemented in the R package RPIV and the ivmodels software package for Python.
翻译:暂无翻译