Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles, including essential traits and preferences of users. To conduct the experiments, we construct a Personalized Scientific Writing (PSW) dataset to study multi-user personalization. PSW requires the models to write scientific papers given specialized author groups with diverse academic backgrounds. As for the results, we demonstrate the effectiveness of capturing user characteristics via STEP-BACK PROFILING for collaborative writing. Moreover, our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark (LaMP), including 7 personalization LLM tasks. Our ablation studies validate the contributions of different components in our method and provide insights into our task definition. Our dataset and code are available at \url{https://github.com/gersteinlab/step-back-profiling}.
翻译:大型语言模型(LLM)在多种自然语言处理任务中表现出色,但在为个体生成个性化内容方面仍存在困难,尤其是在科学写作等现实场景中。为应对这一挑战,我们提出回溯式画像构建方法,通过将用户历史提炼为简明画像(包含用户的基本特征与偏好)来实现LLM的个性化。为开展实验,我们构建了个性化科学写作数据集,用于研究多用户个性化问题。该数据集要求模型在给定具有不同学术背景的专业作者群体条件下撰写科学论文。实验结果表明,通过回溯式画像构建捕捉用户特征在协作写作中具有显著效果。此外,在包含7项个性化LLM任务的通用个性化基准测试(LaMP)中,我们的方法以最高3.6分的优势超越基线模型。消融研究验证了方法中各组件的贡献,并为任务定义提供了理论依据。我们的数据集与代码已公开于\url{https://github.com/gersteinlab/step-back-profiling}。