The increasing demand for personalized interactions with large language models (LLMs) calls for methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more data-efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the evaluation of response prediction, Persona-DB demonstrates superior context efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 10% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands.
翻译:随着对大型语言模型(LLM)个性化交互需求的日益增长,亟需能够准确高效识别用户观点与偏好的方法。检索增强作为一种有效策略脱颖而出,因其能够容纳海量用户而无需承担微调成本。然而,现有研究主要聚焦于优化检索阶段,对数据库表示这一个性化任务关键环节的探索有限。本研究从新颖视角审视该问题,重点关注在LLM定制化场景中如何优化数据表示以实现更高效的数据检索。为应对此挑战,我们提出Persona-DB——一个简洁而有效的框架,其包含层级化构建过程以提升跨任务场景的泛化能力,以及协同精炼机制以有效弥合用户间的知识鸿沟。在响应预测评估中,Persona-DB展现出卓越的上下文效率,仅需显著缩减的检索规模即可保持预测精度,这对于具有长时历史记录或有限上下文窗口的应用场景具有关键优势。实验结果表明,在用户数据极度稀疏的冷启动场景下,本方法实现了超过10%的显著性能提升。此外,分析表明随着检索容量的扩大,协同知识的重要性呈现递增趋势。