Scaling training data and model parameters has long driven progress in large language models (LLMs), but this paradigm is increasingly constrained by the scarcity of high-quality data and diminishing returns from rising computational costs. As a result, recent work is increasing the focus on continual learning from real-world deployment, where user interaction logs provide a rich source of authentic human feedback and procedural knowledge. However, learning from user logs is challenging due to their unstructured and noisy nature. Vanilla LLM systems often struggle to distinguish useful feedback signals from noisy user behavior, and the disparity between user log collection and model optimization (e.g., the off-policy optimization problem) further strengthens the problem. To this end, we propose UNO (User log-driveN Optimization), a unified framework for improving LLM systems (LLMsys) with user logs. UNO first distills logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity, and finally quantifies the cognitive gap between the model's prior knowledge and the log data. This assessment guides the LLMsys to adaptively filter out noisy feedback and construct different modules for primary and reflective experiences extracted from user logs, thereby improving future responses. Extensive experiments show that UNO achieves state-of-the-art effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines. We have open-sourced our code at https://github.com/bebr2/UNO .
翻译:长期以来,扩展训练数据和模型参数推动着大语言模型(LLMs)的进展,但这一范式日益受到高质量数据稀缺和计算成本攀升导致收益递减的制约。因此,近期研究越来越关注从实际部署中进行持续学习,其中用户交互日志提供了丰富的真实人类反馈和程序性知识来源。然而,由于用户日志的非结构化与噪声特性,从中学习颇具挑战。原始的大语言模型系统常难以区分有用反馈信号与噪声用户行为,而用户日志收集与模型优化之间的差异(如离策略优化问题)进一步加剧了这一难题。为此,我们提出UNO(用户日志驱动优化),一个利用用户日志改进大语言模型系统(LLMsys)的统一框架。UNO首先将日志提炼为半结构化规则和偏好对,继而采用查询与反馈驱动的聚类机制管理数据异质性,最后量化模型先验知识与日志数据之间的认知差距。这一评估引导LLMsys自适应过滤噪声反馈,并根据用户日志提取的主要经验与反思经验构建不同模块,从而改善未来响应。大量实验表明,UNO在有效性和效率上均达到当前最优水平,显著优于检索增强生成(RAG)及基于记忆的基线方法。我们已在https://github.com/bebr2/UNO开源代码。