Determining the number of factors in high-dimensional factor models remains a fundamental challenge, particularly when data are incomplete. This paper introduces the concept of identifiable factors, those that can be reliably recovered despite missing observations, and proposes the Missingness-Adaptive Thresholding Estimator (MATE). To our knowledge, MATE is the first missingness-adaptive framework for factor number determination that accommodates both homogeneous and heterogeneous missingness without imposing restrictive assumptions on factor strength. Notably, it operates without data imputation, circumventing the computational burden associated with most existing approaches. We establish a rigorous theoretical foundation for MATE, proving its consistency under a range of structural conditions. Extensive simulations and real-world applications demonstrate that MATE consistently outperforms state-of-the-art methods, exhibiting superior robustness in settings with high missingness rates and weak factor signals.
翻译:确定高维因子模型中的因子个数仍是一项基础性挑战,尤其在数据不完整的情况下尤为突出。本文提出了可识别因子的概念——即那些即使在存在缺失观测的情况下也能被可靠恢复的因子,并在此基础上构建了缺失自适应阈值估计器(MATE)。据我们所知,MATE是首个能同时适应同质与异质缺失模式、且无需对因子强度施加严格假设的缺失自适应因子个数确定框架。值得注意的是,该估计器无需进行数据插补,从而避免了大多数现有方法伴随的计算负担。我们为MATE建立了严格的理论基础,证明了其在多种结构条件下的相合性。大量仿真实验和实际应用表明,MATE始终优于现有最先进方法,在高缺失率和弱因子信号情境下展现出卓越的稳健性。