During natural disasters, people often use social media platforms such as Twitter to ask for help, to provide information about the disaster situation, or to express contempt about the unfolding event or public policies and guidelines. This contempt is in some cases expressed as sarcasm or irony. Understanding this form of speech in a disaster-centric context is essential to improving natural language understanding of disaster-related tweets. In this paper, we introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm, and provide a comprehensive investigation of sarcasm detection using pre-trained language models. Our best model is able to obtain as much as 0.70 F1 on our dataset. We also demonstrate that the performance on HurricaneSARC can be improved by leveraging intermediate task transfer learning. We release our data and code at https://github.com/tsosea2/HurricaneSarc.
翻译:在自然灾害期间,人们常通过Twitter等社交媒体平台寻求帮助、提供灾情信息,或表达对突发事件及公共政策、指南的不满。这种不满情绪有时以讽刺或反语的形式呈现。理解这种在灾害语境下的言语表达,对于提升对灾害相关推文的自然语言理解至关重要。本文介绍了HurricaneSARC——一个包含15,000条标注有意讽刺意图推文的数据集,并基于预训练语言模型对讽刺检测进行了全面研究。我们的最佳模型在该数据集上取得了高达0.70的F1值。同时,我们证明通过中间任务迁移学习可进一步提升在HurricaneSARC上的性能。我们将数据与代码开源在https://github.com/tsosea2/HurricaneSarc。