Tailoring outputs of large language models, such as ChatGPT, to specific user needs remains a challenge despite their impressive generation quality. In this paper, we propose a tri-agent generation pipeline consisting of a generator, an instructor, and an editor to enhance the customization of generated outputs. The generator produces an initial output, the user-specific instructor generates editing instructions, and the editor generates a revised output aligned with user preferences. The inference-only large language model (ChatGPT) serves as both the generator and the editor, while a smaller model acts as the user-specific instructor to guide the generation process toward user needs. The instructor is trained using editor-steered reinforcement learning, leveraging feedback from the large-scale editor model to optimize instruction generation. Experimental results on two abstractive summarization datasets demonstrate the effectiveness of our approach in generating outputs that better fulfill user expectations.
翻译:尽管大规模语言模型(如ChatGPT)在生成质量上表现出色,但将其输出定制化以满足特定用户需求仍是一大挑战。本文提出了一种由生成器、指导器和编辑器组成的三智能体生成管道,以增强生成输出的定制化能力。生成器产生初始输出,用户特定指导器生成编辑指令,编辑器则根据用户偏好生成修订后的输出。仅用于推理的大规模语言模型(ChatGPT)同时担任生成器和编辑器的角色,而一个较小的模型作为用户特定指导器,引导生成过程向用户需求靠拢。指导器通过编辑器驱动的强化学习进行训练,利用大规模编辑器模型的反馈优化指令生成。在两个抽象摘要数据集上的实验结果表明,我们的方法能有效生成更符合用户期望的输出。