Understanding narratives requires identifying which events are most salient for a story's progression. We present a contrastive learning framework for modeling narrative salience that learns story embeddings from narrative twins: stories that share the same plot but differ in surface form. Our model is trained to distinguish a story from both its narrative twin and a distractor with similar surface features but different plot. Using the resulting embeddings, we evaluate four narratologically motivated operations for inferring salience (deletion, shifting, disruption, and summarization). Experiments on short narratives from the ROCStories corpus and longer Wikipedia plot summaries show that contrastively learned story embeddings outperform a masked-language-model baseline, and that summarization is the most reliable operation for identifying salient sentences. If narrative twins are not available, random dropout can be used to generate the twins from a single story. Effective distractors can be obtained either by prompting LLMs or, in long-form narratives, by using different parts of the same story.
翻译:理解叙事需要识别哪些事件对故事发展最为关键。本文提出一种用于建模叙事显著性的对比学习框架,该框架通过叙事孪生(即情节相同但表面形式相异的故事对)学习故事嵌入表示。我们的模型经过训练,能够区分原始故事与其叙事孪生以及具有相似表面特征但情节相异的干扰样本。利用所得嵌入表示,我们评估了四种基于叙事学理论的显著性推断操作(删除、移位、中断和摘要)。在ROCStories语料的短篇叙事和维基百科长篇情节摘要上的实验表明:通过对比学习获得的故事嵌入优于基于掩码语言模型的基线方法,且摘要操作在识别关键语句方面最为可靠。若叙事孪生不可用,可通过随机丢弃操作从单个故事生成孪生样本。有效的干扰样本可通过提示大语言模型获取,对于长篇叙事亦可使用同一故事的不同段落生成。