Powerful large language models have facilitated the development of writing assistants that promise to significantly improve the quality and efficiency of composition and communication. However, a barrier to effective assistance is the lack of personalization in LLM outputs to the author's communication style, specialized knowledge, and values. In this paper, we address this challenge by proposing Pearl, a LLM writing assistant personalized with a retriever that is trained to be generation-calibrated for personalization. Generation calibration ensures that our retriever selects historic user authored documents to augment an LLM prompt such that they are likely to help an LLM generation better adhere to a users' preferences. We propose two key novelties for training such a retriever: (1) A training data selection method that identifies user requests likely to benefit from personalization and documents that provide that benefit; and (2) A scale-calibrating KL-divergence objective that ensures that our retriever scores remain proportional to the downstream generation quality from using the document for personalized generation. In a series of holistic evaluations, we demonstrate the effectiveness of Pearl in generating long-form texts on multiple social media datasets. Finally, we demonstrate how a generation-calibrated retriever can double as a performance predictor -- detecting low quality retrieval, and improving potentially under-performing outputs via revision with LLMs.
翻译:强大的大型语言模型推动了写作助手的发展,这些工具有望显著提升文本创作与沟通的质量和效率。然而,当前LLM输出缺乏对作者沟通风格、专业知识和价值取向的个性化适配,这构成了实现有效辅助的主要障碍。本文通过提出Pearl来解决这一挑战——这是一个通过生成校准检索器实现个性化的大型语言模型写作助手。生成校准机制确保我们的检索器能够选择用户历史文档来增强LLM提示,使得这些文档能有效帮助LLM生成更符合用户偏好的内容。我们为训练此类检索器提出两项关键创新:(1) 训练数据选择方法:识别可能从个性化中受益的用户请求及能提供相应增益的文档;(2) 尺度校准KL散度目标函数:确保检索器评分与使用文档进行个性化生成时的下游生成质量保持比例关系。通过一系列系统性评估,我们在多个社交媒体数据集上验证了Pearl在生成长文本方面的有效性。最后,我们展示了生成校准检索器可兼具性能预测功能——既能检测低质量检索结果,又能通过LLM修订机制改进潜在的低效输出。