In this paper, we introduce CR-COPEC called Causal Rationale of Corporate Performance Changes from financial reports. This is a comprehensive large-scale domain-adaptation causal sentence dataset to detect financial performance changes of corporate. CR-COPEC contributes to two major achievements. First, it detects causal rationale from 10-K annual reports of the U.S. companies, which contain experts' causal analysis following accounting standards in a formal manner. This dataset can be widely used by both individual investors and analysts as material information resources for investing and decision making without tremendous effort to read through all the documents. Second, it carefully considers different characteristics which affect the financial performance of companies in twelve industries. As a result, CR-COPEC can distinguish causal sentences in various industries by taking unique narratives in each industry into consideration. We also provide an extensive analysis of how well CR-COPEC dataset is constructed and suited for classifying target sentences as causal ones with respect to industry characteristics. Our dataset and experimental codes are publicly available.
翻译:本文提出了CR-COPEC(企业绩效变化的因果解释),一个从财务报告中检测企业财务绩效变化的综合性大规模领域自适应因果语句数据集。CR-COPEC实现了两大贡献:第一,能够从美国公司10-K年度报告中检测因果解释,这些报告以正式方式遵循会计准则呈现了专家的因果分析。该数据集可作为个人投资者和分析师的素材信息资源,无需耗费大量精力通读所有文档即可用于投资决策。第二,该数据集细致考虑了影响十二个行业公司财务绩效的不同特征。通过纳入各行业独特的叙述模式,CR-COPEC能够区分不同行业的因果语句。我们还对CR-COPEC数据集的质量及其在行业特性下将目标语句分类为因果语句的适用性进行了深入分析。本数据集及实验代码均已公开。