CatNet: Effective FDR Control in LSTM with Gaussian Mirrors and SHAP Feature Importance

We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM with the Gaussian Mirror (GM) method. To evaluate the feature importance of LSTM in time series, we introduce a vector of the derivative of the SHapley Additive exPlanations (SHAP) to measure feature importance. We also propose a new kernel-based dependence measure to avoid multicollinearity in the GM algorithm, to make a robust feature selection with controlled FDR. We use simulated data to evaluate CatNet's performance in both linear models and LSTM models with different link functions. The algorithm effectively controls the FDR while maintaining a high statistical power in all cases. We also evaluate the algorithm's performance in different low-dimensional and high-dimensional cases, demonstrating its robustness in various input dimensions. To evaluate CatNet's performance in real world applications, we construct a multi-factor investment portfolio to forecast the prices of S\&P 500 index components. The results demonstrate that our model achieves superior predictive accuracy compared to traditional LSTM models without feature selection and FDR control. Additionally, CatNet effectively captures common market-driving features, which helps informed decision-making in financial markets by enhancing the interpretability of predictions. Our study integrates of the Gaussian Mirror algorithm with LSTM models for the first time, and introduces SHAP values as a new feature importance metric for FDR control methods, marking a significant advancement in feature selection and error control for neural networks.

翻译：本文提出CatNet算法，该算法通过高斯镜像方法有效控制错误发现率并筛选LSTM中的重要特征。为评估LSTM在时间序列中的特征重要性，我们引入SHapley可加性解释导数的向量化度量方法。同时提出基于核的依赖度量新方法以规避高斯镜像算法中的多重共线性问题，从而实现具有可控错误发现率的鲁棒特征选择。通过模拟数据评估CatNet在线性模型及具有不同连接函数的LSTM模型中的性能，该算法在所有情况下均能有效控制错误发现率并保持较高的统计功效。我们进一步评估算法在不同低维与高维场景下的表现，证明其在多种输入维度下的鲁棒性。为验证CatNet在实际应用中的性能，我们构建多因子投资组合以预测标普500指数成分股价格。实验结果表明，相较于未进行特征选择与错误发现率控制的传统LSTM模型，本模型具有更优越的预测精度。此外，CatNet能有效捕捉市场驱动的共性特征，通过提升预测可解释性助力金融市场中的理性决策。本研究首次将高斯镜像算法与LSTM模型相结合，并引入SHAP值作为错误发现率控制方法的新特征重要性度量指标，标志着神经网络特征选择与误差控制领域的重大进展。