This paper addresses the challenge of forecasting corporate distress, a problem marked by three key statistical hurdles: (i) right censoring, (ii) high-dimensional predictors, and (iii) mixed-frequency data. To overcome these complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data Sampling) logistic regression. Our approach handles censoring through inverse probability weighting and achieves accurate estimation with numerous mixed-frequency predictors by employing a sparse-group penalty. We establish finite-sample bounds for the estimation error, accounting for censoring, MIDAS approximation error, and heavy tails. For statistical inference, we develop a de-sparsified version of the proposed penalized estimator and establish its asymptotic theory, which enables valid statistical inference in high-dimensional settings with censoring. We show that censoring induces a nonstandard variance structure for the de-sparsified estimator, a feature that, to the best of our knowledge, has not been studied in the existing literature. The superior performance of the method is demonstrated through Monte Carlo simulations. Finally, we present an extensive application of our methodology to predict the financial distress of Chinese-listed firms and to identify covariates that are statistically significant for predicting distress. Our novel procedure is implemented in the R package \texttt{Survivalml}.
翻译:本文针对企业困境预测这一挑战展开研究,该问题存在三个关键统计难点:(i) 右删失,(ii) 高维预测变量,以及(iii) 混合频率数据。为克服这些复杂性,我们提出了一种新颖的高维删失MIDAS(混合数据抽样)逻辑回归模型。我们的方法通过逆概率加权处理删失问题,并采用稀疏群组惩罚实现对大量混合频率预测变量的精确估计。我们建立了估计误差的有限样本界,该界同时考虑了删失、MIDAS近似误差和重尾分布的影响。在统计推断方面,我们开发了所提出的惩罚估计量的去稀疏化版本,并建立了其渐近理论,从而能够在存在删失的高维设定下进行有效的统计推断。我们证明了删失会导致去稀疏化估计量产生非标准的方差结构,据我们所知,这一特性在现有文献中尚未被研究。通过蒙特卡洛模拟,我们验证了该方法的优越性能。最后,我们将该方法广泛应用于预测中国上市公司的财务困境,并识别对困境预测具有统计显著性的协变量。这一新颖方法已通过R包 \texttt{Survivalml} 实现。