Tailoring outputs from large language models, like ChatGPT, to implicit user preferences remains a challenge despite their impressive generative capabilities. In this paper, we propose a tri-agent generation pipeline comprising a generator, an instructor, and an editor to enhance output personalization. The generator produces an initial output, the instructor automatically generates editing instructions based on user preferences, and the editor refines the output to align with those preferences. The inference-only large language model (ChatGPT) serves as both the generator and editor, with a smaller model acting as the instructor to guide output generation. We train the instructor using editor-steered reinforcement learning, leveraging feedback from a large-scale editor model to optimize instruction generation. Experimental results on two abstractive summarization datasets demonstrate the effectiveness of our approach in generating outputs that better meet user expectations. Code is available at \url{https://github.com/Wendy-Xiao/chatgpt_editing_summ}
翻译:尽管ChatGPT等大型语言模型具有令人印象深刻的生成能力,但将其输出适配到隐式用户偏好仍是一个挑战。本文提出一个包含生成器、指导者和编辑器的三智能体生成管道,以增强输出个性化。生成器生成初始输出,指导者基于用户偏好自动生成编辑指令,编辑器则根据这些偏好优化输出。仅用于推理的大型语言模型(ChatGPT)同时充当生成器和编辑器,而一个较小的模型作为指导者引导输出生成。我们使用编辑器引导的强化学习训练指导者,利用来自大规模编辑器模型的反馈优化指令生成。在两个抽象式摘要数据集上的实验结果表明,该方法能有效生成更符合用户期望的输出。代码见\url{https://github.com/Wendy-Xiao/chatgpt_editing_summ}