Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.
翻译:大型语言模型日益被用作个人助理,但多数缺乏持久化用户模型,迫使用户跨会话重复陈述偏好。我们提出向量自适应检索评分(Vector-Adapted Retrieval Scoring, VARS),这是一种管线无关、冻结骨干网络的框架,通过共享偏好空间中的长期与短期向量表征每位用户,并利用这些向量对结构化偏好记忆上的检索评分进行偏置。这些向量通过用户反馈产生的弱标量奖励在线更新,无需针对单个用户进行微调即可实现个性化。我们在包含丰富用户偏好配置文件的在线多会话协作基准测试集 \textsc{MultiSessionCollab} 上,针对数学与代码任务进行评估。在冻结骨干网络条件下,用户感知检索的主要优势在于提升交互效率,而非原始任务准确率的显著增益:我们的完整VARS智能体实现了最优的整体性能,在任务成功率上与强基线Reflection方法持平,同时降低了超时率与用户工作量。学习得到的长期向量还与跨用户偏好重叠性保持一致,而短期向量则捕获会话特有的适应性,验证了双向量设计的可解释性。相关代码、模型与数据已在 https://github.com/YurenHao0426/VARS 开源。