Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, $\texttt{OpenPlot}$, $\texttt{E2EPlot}$ and $\texttt{RLPlot}$, to address these challenges. $\texttt{OpenPlot}$ replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, $\texttt{E2EPlot}$, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by $\texttt{OpenPlot}$. $\texttt{E2EPlot}$ generates story plots of comparable quality to $\texttt{OpenPlot}$, and is > 10$\times$ faster (1k tokens in only 30 seconds on average). Finally, we obtain $\texttt{RLPlot}$ that is further fine-tuned with RLHF on several different reward models for different aspects of story quality, which yields 60.0$\%$ winning rate against $\texttt{E2EPlot}$ along the aspect of suspense and surprise.
翻译:故事剧本虽短,却承载着可能包含数万字完整故事中的大部分关键信息。本文研究故事剧本的自动生成问题,其要素包括故事前提、角色描述、情节大纲等。现有剧本生成器(如DOC(Yang等人,2022a))为生成一个引人入胜的剧本,需在故事规划阶段调用数百至数千次大语言模型(例如OpenAI API),成本高昂且至少耗时数分钟。此外,该方法固有的硬编码特性导致流水线不可微分,阻碍了剧本生成器的快速专业化与个性化定制。本文提出三种模型$\texttt{OpenPlot}$、$\texttt{E2EPlot}$和$\texttt{RLPlot}$以解决上述挑战。$\texttt{OpenPlot}$通过精心设计的提示词,用LLaMA2(Touvron等人,2023)调用取代昂贵的OpenAI API调用,从而低成本生成高质量故事剧本训练数据集。随后,我们利用$\texttt{OpenPlot}$生成的大约13000个故事剧本,通过监督微调训练端到端故事剧本生成器$\texttt{E2EPlot}$。$\texttt{E2EPlot}$生成的故事剧本质量与$\texttt{OpenPlot}$相当,但速度提升超过10倍(平均仅需30秒即可生成1k个词元)。最终,我们利用基于人类反馈的强化学习,在多个针对不同故事质量维度的奖励模型上对$\texttt{E2EPlot}$进行进一步微调,得到$\texttt{RLPlot}$。在悬念与惊喜维度上,$\texttt{RLPlot}$对$\texttt{E2EPlot}$的胜率达到60.0%。