Textual data from financial filings, e.g., the Management's Discussion \& Analysis (MDA) section in Form 10-K, has been used to improve the prediction accuracy of bankruptcy models. In practice, however, we cannot obtain the MDA section for all public companies. The two main reasons for the lack of MDA are: (i) not all companies are obliged to submit the MDA and (ii) technical problems arise when crawling and scrapping the MDA section. This research introduces for the first time, to the best of our knowledge, the concept of multimodal learning in bankruptcy prediction models to solve the problem that for some companies we are unable to obtain the MDA text. We use the Conditional Multimodal Discriminative (CMMD) model to learn multimodal representations that embed information from accounting, market, and textual modalities. The CMMD model needs a sample with all data modalities for model training. At test time, the CMMD model only needs access to accounting and market modalities to generate multimodal representations, which are further used to make bankruptcy predictions. This fact makes the use of bankruptcy prediction models using textual data realistic and possible, since accounting and market data are available for all companies unlike textual data. The empirical results in this research show that the classification performance of our proposed methodology is superior compared to that of a large number of traditional classifier models. We also show that our proposed methodology solves the limitation of previous bankruptcy models using textual data, as they can only make predictions for a small proportion of companies.
翻译:财务申报中的文本数据,例如10-K表格中管理层讨论与分析(MDA)章节,已被用于提升破产预测模型的准确性。然而,在实践中我们无法获取所有上市公司的MDA章节。缺乏MDA的两个主要原因包括:(i)并非所有公司都需提交MDA,以及(ii)在爬取和提取MDA章节时存在技术问题。本研究首次(据我们所知)将多模态学习概念引入破产预测模型,以解决部分公司无法获取MDA文本的问题。我们采用条件多模态判别(CMMD)模型学习嵌入会计、市场和文本模态信息的多元表示。CMMD模型需要包含所有数据模态的样本进行模型训练。在测试阶段,CMMD模型仅需访问会计和市场模态即可生成多模态表示,进而用于破产预测。这一特性使基于文本数据的破产预测模型的应用变得现实可行,因为与文本数据不同,所有公司均可获取会计和市场数据。本研究的实证结果表明,所提出方法的分类性能优于大量传统分类器模型。我们还证明,该方法解决了以往使用文本数据的破产预测模型仅能对少数公司进行预测的局限性。