This study performs BERT-based analysis, which is a representative contextualized language model, on corporate disclosure data to predict impending bankruptcies. Prior literature on bankruptcy prediction mainly focuses on developing more sophisticated prediction methodologies with financial variables. However, in our study, we focus on improving the quality of input dataset. Specifically, we employ BERT model to perform sentiment analysis on MD&A disclosures. We show that BERT outperforms dictionary-based predictions and Word2Vec-based predictions in terms of adjusted R-square in logistic regression, k-nearest neighbor (kNN-5), and linear kernel support vector machine (SVM). Further, instead of pre-training the BERT model from scratch, we apply self-learning with confidence-based filtering to corporate disclosure data (10-K). We achieve the accuracy rate of 91.56% and demonstrate that the domain adaptation procedure brings a significant improvement in prediction accuracy.
翻译:本研究采用基于BERT(一种具有代表性的上下文语言模型)的分析方法,对企业披露数据进行分析,以预测即将发生的破产。以往关于破产预测的文献主要侧重于利用财务变量开发更复杂的预测方法。然而,在本研究中,我们专注于提升输入数据集的质量。具体而言,我们运用BERT模型对管理层讨论与分析(MD&A)披露信息进行情感分析。研究表明,在逻辑回归、k近邻(kNN-5)及线性核支持向量机(SVM)的调整R方指标上,BERT优于基于词典和基于Word2Vec的预测方法。此外,我们未从头预训练BERT模型,而是对10-K企业披露数据应用了基于置信度过滤的自学习策略。最终实现了91.56%的准确率,验证了领域自适应过程显著提升了预测精度。