Prior work in style-controlled text generation has focused on tasks such as emulating the style of prolific literary authors, producing formal or informal text, and the degree of toxicity of generated text. Plentiful demonstrations of these styles are available, and as a result modern language models are often able to emulate them, either via prompting or discriminative control. However, in applications such as writing assistants, it is desirable for language models to produce text in an author-specific style on the basis of a small writing sample. We find that instruction-tuned language models can struggle to reproduce author-specific style demonstrated in a prompt. Instead, we propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features. A central challenge in doing so is that an author's writing is characterized by surprising token choices under a generic language model. To reconcile this tension, we combine generative re-scoring to achieve an author-specific model, with discriminative control to ensure style consistency at the sequence-level. The combination of these approaches is found to be particularly effective at adhering to an author-specific style in a variety of conditions, including unconditional generation and style transfer, and is applicable to any underlying language model without requiring fine-tuning.
翻译:先前关于风格控制文本生成的研究主要集中在模仿多产文学作家的风格、生成正式或非正式文本以及生成文本的有毒程度等任务上。这些风格的丰富示例可供使用,因此现代语言模型通常能够通过提示或判别控制来模仿它们。然而,在写作助手等应用中,语言模型需要根据少量写作样本以作者特定风格生成文本。我们发现,指令调优的语言模型在复现提示中展示的作者特定风格时可能存在困难。为此,我们提出利用对比学习获得的、捕捉风格特征的表征来引导语言模型以目标风格生成文本。过程中面临的核心挑战是:作者的写作特点体现在通用语言模型下令人意外的标记选择上。为调和这一矛盾,我们通过生成性重新评分构建作者特定模型,并结合判别控制确保序列级别的风格一致性。实验表明,这种组合方法在无条件生成、风格迁移等多种条件下,能特别有效地遵循作者特定风格,且无需对底层语言模型进行微调即可应用。