Financial institutions manage operational risk by carrying out the activities required by regulation, such as collecting loss data, calculating capital requirements, and reporting. The information necessary for this purpose is then collected in the OpRisk databases. Recorded for each OpRisk event are loss amounts, dates, organizational units involved, event types and descriptions. In recent years, operational risk functions have been required to go beyond their regulatory tasks to proactively manage the operational risk, preventing or mitigating its impact. As OpRisk databases also contain event descriptions, usually defined as free text fields, an area of opportunity is the valorization of all the information contained in such records. As far as we are aware of, the present work is the first one that has addressed the application of text analysis techniques to the OpRisk event descriptions. In this way, we have complemented and enriched the established framework of statistical methods based on quantitative data. Specifically, we have applied text analysis methodologies to extract information from descriptions in the OpRisk database. After delicate tasks like data cleaning, text vectorization, and semantic adjustment, we apply methods of dimensionality reduction and several clustering models and algorithms to develop a comparison of their performances and weaknesses. Our results improve retrospective knowledge of loss events and enable to mitigate future risks.
翻译:金融机构通过执行监管要求的活动来管理操作风险,例如收集损失数据、计算资本要求及进行报告。为此所需的信息随后被录入操作风险数据库。每个操作风险事件记录的信息包括损失金额、发生日期、涉及的组织单元、事件类型及其描述。近年来,操作风险职能部门被要求超越常规监管任务,主动管理操作风险,以预防或减轻其影响。由于操作风险数据库还包含通常定义为自由文本字段的事件描述,因此一个可行的领域是对这些记录中包含的所有信息进行价值挖掘。据我们所知,本研究是首次将文本分析技术应用于操作风险事件描述的工作。通过这种方式,我们补充并丰富了基于定量数据的既定统计分析方法体系。具体而言,我们应用文本分析方法从操作风险数据库的描述中提取信息。在完成数据清洗、文本向量化及语义调整等精细工作后,我们采用降维方法及多种聚类模型与算法,对其性能和局限性进行了比较分析。研究结果增强了对损失事件的事后认知,并有助于减轻未来风险。