Controllable text generation is a challenging and meaningful field in natural language generation (NLG). Especially, poetry generation is a typical one with well-defined and strict conditions for text generation which is an ideal playground for the assessment of current methodologies. While prior works succeeded in controlling either semantic or metrical aspects of poetry generation, simultaneously addressing both remains a challenge. In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry to tackle such challenges. In terms of semantics, our PoetryDiffusion model, built upon the Diffusion model, generates entire sentences or poetry by comprehensively considering the entirety of sentence information. This approach enhances semantic expression, distinguishing it from autoregressive and large language models (LLMs). For metrical control, the separation feature of diffusion generation and its constraint control module enable us to flexibly incorporate a novel metrical controller to manipulate and evaluate metrics (format and rhythm). The denoising process in PoetryDiffusion allows for gradual enhancement of semantics and flexible integration of the metrical controller which can calculate and impose penalties on states that stray significantly from the target control distribution. Experimental results on two datasets demonstrate that our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
翻译:可控文本生成是自然语言生成(NLG)领域一个具有挑战性且意义深远的研究方向。尤其诗歌生成作为一类条件明确且严格的典型文本生成任务,为评估现有方法提供了理想的试验场。尽管先前研究成功实现了对诗歌生成中语义或格律单一维度的控制,但同时兼顾两个方面仍存在挑战。本文首次将扩散模型应用于十四行诗与中文宋词生成,以解决上述难题。在语义层面,基于扩散模型构建的PoetryDiffusion模型通过综合考量整句信息来生成完整句子或诗歌,这种增强语义表达的方式使其区别于自回归模型及大型语言模型(LLMs)。在格律控制方面,扩散生成过程的分离特性及其约束控制模块使我们能够灵活集成新型格律控制器,实现对格式、韵律等格律指标的操控与评估。PoetryDiffusion的去噪过程可逐步强化语义表达,并灵活整合格律控制器——该控制器能计算并惩罚显著偏离目标控制分布的状态。在两个数据集上的实验结果表明,我们的模型在自动评估的语义、格律及综合性能指标以及人工评估中均优于现有模型。