Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.
翻译:创建文本的节略版本需要在不牺牲其语言特质的前提下进行精简。本文首次从自然语言处理角度审视这一任务。我们提出新资源AbLit,该资源源自英语文学书籍的节略版本。数据集捕捉了原始文本与节略文本之间的段落级对齐关系。我们刻画了这些对齐关系的语言关联特性,构建了可预测此类关联的自动模型,并实现了对新文本的节略生成。研究结果表明,文本节略是一项具有挑战性的任务,为未来的资源开发与相关研究提供了动力。该数据集获取地址为github.com/roemmele/AbLit。