IKDSumm: Incorporating Key-phrases into BERT for extractive Disaster Tweet Summarization

Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Therefore, humanitarian organizations, government agencies, and volunteers rely on a summary of this information, i.e., tweets, for effective disaster management. Although there are several existing supervised and unsupervised approaches for automated tweet summary approaches, these approaches either require extensive labeled information or do not incorporate specific domain knowledge of disasters. Additionally, the most recent approaches to disaster summarization have proposed BERT-based models to enhance the summary quality. However, for further improved performance, we introduce the utilization of domain-specific knowledge without any human efforts to understand the importance (salience) of a tweet which further aids in summary creation and improves summary quality. In this paper, we propose a disaster-specific tweet summarization framework, IKDSumm, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, IKDSumm ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of IKDSumm with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that IKDSumm outperforms existing techniques by approximately 2-79% in terms of ROUGE-N F1-score.

翻译：在线社交媒体平台（如Twitter）是灾难事件中最具价值的信息来源之一。因此，人道主义组织、政府机构和志愿者依赖这些信息的摘要（即推文）进行有效的灾难管理。现有多种有监督和无监督的自动推文摘要方法，但这些方法要么需要大量标注信息，要么未融入灾难领域的特定知识。此外，最新灾难摘要方法已提出基于BERT的模型以提升摘要质量。为进一步优化性能，本文提出无需人工干预即可利用领域知识来理解推文重要性（显著度）的方法，从而辅助摘要生成并提升质量。本文提出灾难特定推文摘要框架IKDSumm，该框架首先通过推文的关键短语识别与灾难相关的关键重要信息。我们利用现有灾难本体论中的领域知识，无需人工干预即可自动识别这些关键短语，并据此生成推文摘要。因此，针对灾难相关推文，IKDSumm能在无人工干预下确保实现信息覆盖度、相关性和多样性等摘要核心目标。我们在12个灾难数据集上，将IKDSumm与8种最优技术进行性能评估。结果表明，IKDSumm在ROUGE-N F1值上比现有技术提升约2-79%。