The model-X knockoffs framework provides a flexible tool for achieving finite-sample false discovery rate (FDR) control in variable selection in arbitrary dimensions without assuming any dependence structure of the response on covariates. It also completely bypasses the use of conventional p-values, making it especially appealing in high-dimensional nonlinear models. Existing works have focused on the setting of independent and identically distributed observations. Yet time series data is prevalent in practical applications in various fields such as economics and social sciences. This motivates the study of model-X knockoffs inference for time series data. In this paper, we make some initial attempt to establish the theoretical and methodological foundation for the model-X knockoffs inference for time series data. We suggest the method of time series knockoffs inference (TSKI) by exploiting the ideas of subsampling and e-values to address the difficulty caused by the serial dependence. We also generalize the robust knockoffs inference to the time series setting and relax the assumption of known covariate distribution required by model-X knockoffs, because such an assumption is overly stringent for time series data. We establish sufficient conditions under which TSKI achieves the asymptotic FDR control. Our technical analysis reveals the effects of serial dependence and unknown covariate distribution on the FDR control. We conduct power analysis of TSKI using the Lasso coefficient difference knockoff statistic under linear time series models. The finite-sample performance of TSKI is illustrated with several simulation examples and an economic inflation study.
翻译:模型-X knockoffs框架提供了一种灵活的工具,可在任意维度下实现变量选择中有限样本的假发现率控制,且无需假设响应变量对协变量的依赖结构。该方法完全绕过了传统p值的应用,在高维非线性模型中尤其具有吸引力。现有研究主要集中在独立同分布观测数据的场景。然而,时间序列数据在经济学和社会科学等领域的实际应用中普遍存在,这促使我们研究针对时间序列数据的模型-X knockoffs推断。本文初步尝试为时间序列数据的模型-X knockoffs推断建立理论和方法基础。我们提出时间序列knockoffs推断方法,通过利用子采样和e值的思想来解决序列依赖带来的困难。同时,我们将稳健knockoffs推断推广至时间序列场景,并放宽模型-X knockoffs所需的已知协变量分布假设,因为该假设对时间序列数据过于严格。我们建立了时间序列knockoffs推断实现渐近假发现率控制的充分条件,技术分析揭示了序列依赖和未知协变量分布对假发现率控制的影响。我们在线性时间序列模型下使用Lasso系数差分knockoff统计量进行了时间序列knockoffs推断的功效分析。通过若干模拟算例和一项经济通货膨胀研究验证了TSKI的有限样本性能。