While pre-trained language models can generate individually fluent sentences for automatic story generation, they struggle to generate stories that are coherent, sensible and interesting. Current state-of-the-art (SOTA) story generation models explore using higher-level features such as plots or commonsense knowledge to improve the quality of generated stories. Prompt-based learning using very large pre-trained language models (VLPLMs) such as GPT3 has demonstrated impressive performance even across various NLP tasks. In this paper, we present an extensive study using automatic and human evaluation to compare the story generation capability of VLPLMs to those SOTA models in three different datasets where stories differ in style, register and length. Our results show that VLPLMs generate much higher quality stories than other story generation models, and to a certain extent rival human authors, although preliminary investigation also reveals that they tend to ``plagiarise'' real stories in scenarios that involve world knowledge.
翻译:尽管预训练语言模型能够为自动故事生成生成流畅的单个句子,但它们难以生成连贯、合理且有趣的故事。当前最先进的故事生成模型探索使用更高层次的特征(如情节或常识知识)来提高生成故事的质量。基于提示学习的极大型预训练语言模型(如GPT3)已在多种自然语言处理任务中展现出令人印象深刻的表现。本文通过自动评估和人工评估,在三个故事风格、语域和长度各不相同的数据集上,对极大型预训练语言模型与这些最先进模型的故事生成能力进行了广泛对比。结果表明,极大型预训练语言模型生成的故事质量远高于其他故事生成模型,并在一定程度上可与人类作者媲美,尽管初步调查也揭示了它们在涉及世界知识的场景中倾向于“抄袭”真实故事。