Event extraction is an important natural language processing (NLP) task of identifying events in an unstructured text. Although a plethora of works deal with event extraction from new articles, clinical text etc., only a few works focus on event extraction from literary content. Detecting events in short stories presents several challenges to current systems, encompassing a different distribution of events as compared to other domains and the portrayal of diverse emotional conditions. This paper presents \texttt{Vrittanta-EN}, a collection of 1000 English short stories annotated for real events. Exploring this field could result in the creation of techniques and resources that support literary scholars in improving their effectiveness. This could simultaneously influence the field of Natural Language Processing. Our objective is to clarify the intricate idea of events in the context of short stories. Towards the objective, we collected 1,000 short stories written mostly for children in the Indian context. Further, we present fresh guidelines for annotating event mentions and their categories, organized into \textit{seven distinct classes}. The classes are {\tt{COGNITIVE-MENTAL-STATE(CMS), COMMUNICATION(COM), CONFLICT(CON), GENERAL-ACTIVITY(GA), LIFE-EVENT(LE), MOVEMENT(MOV), and OTHERS(OTH)}}. Subsequently, we apply these guidelines to annotate the short story dataset. Later, we apply the baseline methods for automatically detecting and categorizing events. We also propose a prompt-based method for event detection and classification. The proposed method outperforms the baselines, while having significant improvement of more than 4\% for the class \texttt{CONFLICT} in event classification task.
翻译:事件抽取是从非结构化文本中识别事件的重要自然语言处理任务。尽管已有大量研究涉及从新闻文章、临床文本等材料中抽取事件,但仅有少数工作专注于从文学内容中提取事件。在短篇故事中检测事件对现有系统提出了若干挑战,包括与其他领域相比事件分布存在差异,以及对多样化情感状态的描绘。本文介绍了\texttt{Vrittanta-EN}数据集,这是一个包含1000篇标注真实事件的英文短篇故事集合。探索这一领域可能催生支持文学研究者提升工作效率的技术与资源,同时也能对自然语言处理领域产生影响。我们的目标是阐明短篇故事语境中事件的复杂概念。为此,我们收集了1000篇主要为印度儿童创作的短篇故事。进一步地,我们提出了全新的事件提及及其类别标注规范,将其组织为\textit{七个独立类别}:{\tt{认知心理状态(CMS)、交流(COM)、冲突(CON)、常规活动(GA)、生命事件(LE)、移动(MOV)及其他(OTH)}}。随后,我们应用这些规范对短篇故事数据集进行标注。接着,我们采用基线方法进行事件的自动检测与分类。此外,我们提出了一种基于提示的事件检测与分类方法。该方法在性能上超越了基线模型,其中在事件分类任务中\texttt{冲突}类别的识别准确率显著提升了4%以上。