As part of its digitization initiative, the German Central Bank (Deutsche Bundesbank) wants to examine the extent to which natural Language Processing (NLP) can be used to make independent decisions upon the eligibility criteria of securities prospectuses. Every month, the Directorate General Markets at the German Central Bank receives hundreds of scanned prospectuses in PDF format, which must be manually processed to decide upon their eligibility. We found that this tedious and time-consuming process can be (semi-)automated by employing modern NLP model architectures, which learn the linguistic feature representation in text to identify the present eligible and ineligible criteria. The proposed Decision Support System provides decisions of document-level eligibility criteria accompanied by human-understandable explanations of the decisions. The aim of this project is to model the described use case and to evaluate the extent to which current research results from the field of NLP can be applied to this problem. After creating a heterogeneous domain-specific dataset containing annotations of eligible and non-eligible mentions of relevant criteria, we were able to successfully build, train and deploy a semi-automatic decider model. This model is based on transformer-based language models and decision trees, which integrate the established rule-based parts of the decision processes. Results suggest that it is possible to efficiently model the problem and automate decision making to more than 90% for many of the considered eligibility criteria.
翻译:作为其数字化转型计划的一部分,德国央行(德意志联邦银行)希望探究自然语言处理(NLP)技术在多大程度上能够用于对证券招股说明书的资格标准做出自主决策。德国央行市场总司每月会收到数百份扫描版PDF格式的招股说明书,这些文件需经人工处理以判定其是否符合资格标准。我们发现,通过采用现代NLP模型架构(该类架构能够学习文本中的语言特征表示,以识别存在的合格与不合格标准),这一繁琐且耗时的流程可实现(半)自动化。所提出的决策支持系统可提供文档级别的资格标准判定结果,并附带人类可理解的决策解释。本项目的目标是建模所描述的用例,并评估当前NLP领域的研究成果在多大程度上可应用于此问题。在构建包含相关标准合格与不合格标注的异构领域特定数据集后,我们成功构建、训练并部署了半自动决策模型。该模型基于Transformer语言模型与决策树,整合了决策过程中既有的基于规则的组件。结果表明,我们能够高效地对问题进行建模,并对超过90%的所考虑资格标准实现自动化决策。