Fire incident reports contain detailed textual narratives that capture causal factors often overlooked in structured records, while financial damage amounts provide measurable outcomes of these events. Integrating these two sources of information is essential for uncovering interpretable links between descriptive causes and their economic consequences. To this end, we develop a data-driven framework that constructs a composite Risk Index, enabling systematic quantification of how specific keywords relate to property damage amounts. This index facilitates both the identification of high-impact terms and the aggregation of risks across semantically related clusters, thereby offering a principled measure of fire-related financial risk. Using more than a decade of Korean fire investigation reports on the chemical industry classified as Special Buildings (2013 through 2024), we employ topic modeling and network-based embedding to estimate semantic similarities from interactions among words and subsequently apply Lasso regression to quantify their associations with property damage amounts, thereby estimate fire risk index. This approach enables us to assess fire risk not only at the level of individual terms but also within their broader textual context, where highly interactive related words provide insights into collective patterns of hazard representation and their potential impact on expected losses. The analysis highlights several domains of risk, including hazardous chemical leakage, unsafe storage practices, equipment and facility malfunctions, and environmentally induced ignition. The results demonstrate that text-derived indices provide interpretable and practically relevant insights, bridging unstructured narratives with structured loss information and offering a basis for evidence-based fire risk assessment and management.
翻译:火灾事故报告包含详细文本叙述,这些叙述捕捉了结构化记录中常被忽视的因果因素,而财务损失金额则提供了这些事件的可量化结果。整合这两类信息对于揭示描述性原因与其经济后果之间可解释的关联至关重要。为此,我们开发了一个数据驱动框架,构建了一个复合风险指数,能够系统量化特定关键词与财产损失金额之间的关系。该指数既有助于识别高影响术语,也能对语义相关集群的风险进行聚合,从而提供一种基于原则的火灾相关财务风险度量方法。利用韩国化工行业(归类为特殊建筑)超过十年(2013年至2024年)的火灾调查报告,我们采用主题建模和基于网络的嵌入技术,从词语间的交互中估计语义相似性,随后应用Lasso回归量化这些词语与财产损失金额的关联,进而估算火灾风险指数。该方法使我们不仅能在单个术语层面评估火灾风险,还能在其更广泛的文本语境中进行评估,其中高度交互的相关词语揭示了危害表征的集体模式及其对预期损失的潜在影响。分析突出了多个风险领域,包括危险化学品泄漏、不安全存储操作、设备和设施故障,以及环境引发的点火。结果表明,文本衍生的指数提供了可解释且具有实际相关性的见解,将非结构化叙述与结构化损失信息相连接,为基于证据的火灾风险评估与管理提供了基础。