The advent of the era of big data provides new ideas for financial distress prediction. In order to evaluate the financial status of listed companies more accurately, this study establishes a financial distress prediction indicator system based on multi-source data by integrating three data sources: the company's internal management, the external market and online public opinion. This study addresses the redundancy and dimensional explosion problems of multi-source data integration, feature selection of the fused data, and a financial distress prediction model based on maximum relevance and minimum redundancy and support vector machine recursive feature elimination (MRMR-SVM-RFE). To verify the effectiveness of the model, we used back propagation (BP), support vector machine (SVM), and gradient boosted decision tree (GBDT) classification algorithms, and conducted an empirical study on China's listed companies based on different financial distress prediction indicator systems. MRMR-SVM-RFE feature selection can effectively extract information from multi-source fused data. The new feature dataset obtained by selection has higher prediction accuracy than the original data, and the BP classification model is better than linear regression (LR), decision tree (DT), and random forest (RF).
翻译:大数据时代的到来为财务困境预测提供了新思路。为更准确评估上市公司财务状况,本研究通过整合公司内部管理、外部市场及网络舆情三类数据源,构建基于多源数据的财务困境预测指标体系。针对多源数据融合中的冗余性与维度爆炸问题,本研究提出基于最大相关最小冗余与支持向量机递归特征消除(MRMR-SVM-RFE)的特征选择方法,并建立财务困境预测模型。为验证模型有效性,采用反向传播(BP)、支持向量机(SVM)及梯度提升决策树(GBDT)分类算法,基于不同财务困境预测指标体系对中国上市公司进行实证研究。结果表明,MRMR-SVM-RFE特征选择可有效提取多源融合数据中的信息,经选择的新特征数据集较原始数据具有更高预测精度,且BP分类模型优于线性回归(LR)、决策树(DT)及随机森林(RF)。