In this paper, we present a variety of classification experiments related to the task of fictional discourse detection. We utilize a diverse array of datasets, including contemporary professionally published fiction, historical fiction from the Hathi Trust, fanfiction, stories from Reddit, folk tales, GPT-generated stories, and anglophone world literature. Additionally, we introduce a new feature set of word "supersenses" that facilitate the goal of semantic generalization. The detection of fictional discourse can help enrich our knowledge of large cultural heritage archives and assist with the process of understanding the distinctive qualities of fictional storytelling more broadly.
翻译:本文呈现了与虚构话语识别任务相关的多种分类实验。我们使用了多样化的数据集,包括当代专业出版小说、哈蒂信托基金(Hathi Trust)的历史小说、同人小说、Reddit平台的故事、民间传说、GPT生成的故事以及英语世界文学。此外,我们引入了一组新的词汇“超义原”(supersenses)特征,这些特征有助于实现语义泛化的目标。虚构话语的识别不仅有助于丰富我们对大型文化遗产档案的认识,还能更广泛地辅助理解虚构叙事独特品质的过程。