Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines, including straight lines, curves, and polylines of different orientations and angles. Second, we developed a two-stage method to efficiently generate the industry's first benchmark of structure diagrams from Chinese financial announcements, where a large number of diagrams were synthesized and annotated using an automated tool to train a preliminary recognition model with fairly good performance, and then a high-quality benchmark can be obtained by automatically annotating the real-world structure diagrams using the preliminary model and then making few manual corrections. Finally, we experimentally verified the significant performance advantage of our structure diagram recognition method over previous methods.
翻译:从财务报表的结构图中精确提取结构化数据,对于构建金融知识图谱及进一步提升各类金融应用效率具有重要实践意义。首先,我们提出了一种新的财务报表结构图识别方法,能够更有效地检测和提取不同类型连接线,包括不同方向和角度的直线、曲线及折线。其次,我们开发了一种两阶段方法,高效生成了金融领域首个中文财务报表结构图基准数据集:通过自动化工具合成并标注大量图表,训练出性能良好的初步识别模型,进而利用该模型自动标注真实场景下的结构图,并辅以少量人工修正,最终获得高质量基准数据集。最后,实验验证了我们的结构图识别方法相比以往方法具有显著的性能优势。