PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT

Disaster summarization approaches provide an overview of the important information posted during disaster events on social media platforms, such as, Twitter. However, the type of information posted significantly varies across disasters depending on several factors like the location, type, severity, etc. Verification of the effectiveness of disaster summarization approaches still suffer due to the lack of availability of good spectrum of datasets along with the ground-truth summary. Existing approaches for ground-truth summary generation (ground-truth for extractive summarization) relies on the wisdom and intuition of the annotators. Annotators are provided with a complete set of input tweets from which a subset of tweets is selected by the annotators for the summary. This process requires immense human effort and significant time. Additionally, this intuition-based selection of the tweets might lead to a high variance in summaries generated across annotators. Therefore, to handle these challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we partly automate the ground-truth summary generation procedure. This approach reduces the effort and time of the annotators while ensuring the quality of the created ground-truth summary. We validate the effectiveness of PORTRAIT on 5 disaster events through quantitative and qualitative comparisons of ground-truth summaries generated by existing intuitive approaches, a semi-automated approach, and PORTRAIT. We prepare and release the ground-truth summaries for 5 disaster events which consist of both natural and man-made disaster events belonging to 4 different countries. Finally, we provide a study about the performance of various state-of-the-art summarization approaches on the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores.

翻译：灾害摘要方法可概括社交媒体平台（如Twitter）上灾害事件期间发布的重要信息。然而，不同灾害事件中信息类型因地理位置、类型、严重程度等因素而显著变化。灾害摘要方法有效性的验证仍受限于缺乏高质量数据集及对应真实摘要。现有真实摘要生成方法（抽取式摘要的真实标注）依赖于标注者的经验与直觉。标注者需从完整输入推文中选取子集构成摘要，这一过程需要大量人力与时间成本。此外，基于直觉的推文选择可能导致不同标注者生成的摘要存在显著差异。为解决上述挑战，我们提出一种混合（半自动化）方法PORTRAIT，该方法部分自动化了真实摘要的生成流程，在减少标注者工作量与时间消耗的同时，确保生成摘要的质量。通过将PORTRAIT生成的灾害事件真实摘要与现有直觉式方法及半自动化方法进行定量与定性比较，我们在5个灾害事件上验证了其有效性。我们整理并发布了涵盖4个国家、由自然与人为灾害组成的5个灾害事件真实摘要数据集。最后，基于ROUGE-N F1分数，研究了多种最新摘要方法在PORTRAIT生成的真实摘要上的性能表现。