Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines, including straight lines, curves, and polylines of different orientations and angles. Second, we developed a two-stage method to efficiently generate the industry's first benchmark of structure diagrams from Chinese financial announcements, where a large number of diagrams were synthesized and annotated using an automated tool to train a preliminary recognition model with fairly good performance, and then a high-quality benchmark can be obtained by automatically annotating the real-world structure diagrams using the preliminary model and then making few manual corrections. Finally, we experimentally verified the significant performance advantage of our structure diagram recognition method over previous methods.
翻译:从金融公告的结构图中准确提取结构化数据,对于构建金融知识图谱并进一步提升各类金融应用的效率具有重要的实际意义。首先,我们提出了一种新的金融公告结构图识别方法,该方法能更有效地检测和提取不同类型的连接线,包括不同方向和角度的直线、曲线及折线。其次,我们开发了一种两阶段方法,以高效生成首个来自中文金融公告的结构图基准数据集;其中,利用自动化工具合成并标注了大量图表,用于训练一个性能良好的初步识别模型,随后通过该模型自动标注真实世界结构图并辅以少量人工修正,即可获得高质量基准数据集。最后,我们通过实验验证了所提出的结构图识别方法相较于以往方法的显著性能优势。