Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span level. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert annotations, including binary story labels, story spans, and event spans. We evaluate a range of detection methods using our data, and we identify the distinctive textual features of online storytelling, focusing on storytelling span detection, which we introduce as a new task. We illuminate distributional characteristics of storytelling on a large community-centric social media platform, and we also conduct a case study on r/ChangeMyView, where storytelling is used as one of many persuasive strategies, illustrating that our data and models can be used for both inter- and intra-community research. Finally, we discuss implications of our tools and analyses for narratology and the study of online communities.
翻译:在线社区中的故事检测是一项具有挑战性的任务,因为故事散布在各个社区中,并与单一文本中的非叙事片段交织在一起。我们通过构建并发布StorySeeker工具包来应对这一挑战,该工具包包括一个包含502个Reddit帖子和评论的丰富标注数据集、一个针对社交媒体环境调整的详细编码手册,以及在文档和片段级别预测叙事的模型。我们的数据集从数百个流行英语Reddit社区中采样,涵盖33个主题类别,并包含细粒度的专家标注,包括二元故事标签、故事片段和事件片段。我们利用数据评估了一系列检测方法,并识别了在线叙事的独特文本特征,重点聚焦于故事片段检测——我们将其作为一项新任务提出。我们揭示了在一个以社区为中心的大型社交媒体平台上叙事的分布特征,并对r/ChangeMyView进行了案例研究,在该社区中,叙事被用作多种说服策略之一,表明我们的数据和模型可用于社区间及社区内的研究。最后,我们讨论了我们的工具和分析对叙事学及在线社区研究的影响。