Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction. Most existing studies focus on designing specialized models for a particular domain, or they require fine-tuning the LLMs to generate personalized text. We consider a typical scenario in which the large language model, which generates personalized output, is frozen and can only be accessed through APIs. Under this constraint, all one can do is to improve the input text (i.e., text prompts) sent to the LLM, a procedure that is usually done manually. In this paper, we propose a novel method to automatically revise prompts for personalized text generation. The proposed method takes the initial prompts generated by a state-of-the-art, multistage framework for personalized generation and rewrites a few critical components that summarize and synthesize the personal context. The prompt rewriter employs a training paradigm that chains together supervised learning (SL) and reinforcement learning (RL), where SL reduces the search space of RL and RL facilitates end-to-end training of the rewriter. Using datasets from three representative domains, we demonstrate that the rewritten prompts outperform both the original prompts and the prompts optimized via supervised learning or reinforcement learning alone. In-depth analysis of the rewritten prompts shows that they are not only human readable, but also able to guide manual revision of prompts when there is limited resource to employ reinforcement learning to train the prompt rewriter, or when it is costly to deploy an automatic prompt rewriter for inference.
翻译:借助大型语言模型(LLMs),个性化文本生成已成为一个快速发展的研究方向。现有研究大多聚焦于为特定领域设计专门的模型,或需要对LLMs进行微调以生成个性化文本。我们考虑一个典型场景:用于生成个性化输出的大型语言模型是冻结的,且只能通过API访问。在此约束下,唯一能做的就是改进发送给LLM的输入文本(即文本提示),这一过程通常需要手动完成。本文提出了一种新颖方法,可自动改写提示以用于个性化文本生成。该方法提取由最先进的多阶段个性化生成框架生成的初始提示,并重写其中总结与综合个人背景的关键组件。提示改写器采用了一种结合监督学习(SL)和强化学习(RL)的训练范式,其中SL缩小了RL的搜索空间,而RL则促进了改写器的端到端训练。利用来自三个代表性领域的数据集,我们证明改写后的提示不仅优于原始提示,也优于仅通过监督学习或强化学习优化的提示。对改写后提示的深入分析表明,它们不仅可读性强,还能在资源有限无法使用强化学习训练提示改写器,或自动部署提示改写器进行推理成本较高时,指导提示的人工修订。