Event extraction is an important natural language processing (NLP) task of identifying events in an unstructured text. Although a plethora of works deal with event extraction from new articles, clinical text etc., only a few works focus on event extraction from literary content. Detecting events in short stories presents several challenges to current systems, encompassing a different distribution of events as compared to other domains and the portrayal of diverse emotional conditions. This paper presents \texttt{Vrittanta-EN}, a collection of 1000 English short stories annotated for real events. Exploring this field could result in the creation of techniques and resources that support literary scholars in improving their effectiveness. This could simultaneously influence the field of Natural Language Processing. Our objective is to clarify the intricate idea of events in the context of short stories. Towards the objective, we collected 1,000 short stories written mostly for children in the Indian context. Further, we present fresh guidelines for annotating event mentions and their categories, organized into \textit{seven distinct classes}. The classes are {\tt{COGNITIVE-MENTAL-STATE(CMS), COMMUNICATION(COM), CONFLICT(CON), GENERAL-ACTIVITY(GA), LIFE-EVENT(LE), MOVEMENT(MOV), and OTHERS(OTH)}}. Subsequently, we apply these guidelines to annotate the short story dataset. Later, we apply the baseline methods for automatically detecting and categorizing events. We also propose a prompt-based method for event detection and classification. The proposed method outperforms the baselines, while having significant improvement of more than 4\% for the class \texttt{CONFLICT} in event classification task.
翻译:事件抽取是从非结构化文本中识别事件的重要自然语言处理任务。尽管已有大量研究处理新闻文章、临床文本等材料中的事件抽取,但仅有少数工作专注于文学内容的事件抽取。在短篇小说中检测事件对现有系统提出了若干挑战,包括与其他领域相比事件分布存在差异,以及需要处理多样化的情感状态描述。本文提出了\texttt{Vrittanta-EN}数据集,这是一个包含1000篇标注真实事件的英文短篇故事集合。探索这一领域可能催生支持文学研究者提升研究效率的技术与资源,同时也能对自然语言处理领域产生影响。我们的目标是厘清短篇小说语境中事件的复杂概念。为此,我们收集了1000篇主要为印度儿童创作的短篇故事。进一步,我们提出了全新的事件提及及其类别标注规范,将其组织为\textit{七个独立类别}:{\tt{认知心理状态(CMS)、沟通交流(COM)、冲突对抗(CON)、日常活动(GA)、生命事件(LE)、移动行为(MOV)及其他类别(OTH)}}。随后,我们应用该规范对短篇小说数据集进行标注。继而,我们采用基线方法实现事件的自动检测与分类。此外,我们提出了一种基于提示的事件检测与分类方法。该方法在事件分类任务中优于基线模型,其中\texttt{冲突}类别的分类性能提升超过4%。