LLMs operating in dynamic real-world contexts often encounter knowledge that evolves continuously or emerges incrementally. To remain accurate and effective, models must adapt to newly arriving information on the fly. We introduce Online Adaptation to Continual Knowledge Streams(OAKS) to evaluate this capability, establishing a benchmark for online adaptation over streaming, continually updating knowledge. Specifically, the benchmark is structured as a sequence of fine-grained context chunks where facts change dynamically across time intervals. OAKS comprises two datasets: OAKS-BABI and OAKS-Novel, where individual facts evolve multiple times across context chunks. These datasets include dense annotations to measure whether models track changes accurately. Evaluating 14 models with varied inference approaches, we observe significant limitations in current methodologies. Both state-of-the-art models and agentic memory systems fail to adapt robustly on OAKS, demonstrating delays in state-tracking and susceptibility to distraction within streaming environments.
翻译:在动态现实环境中运行的大型语言模型(LLMs)常面临持续演化或渐进涌现的知识。为保持准确性和有效性,模型必须实时适应新到达的信息。本文提出持续知识流在线适应(OAKS)评估框架,建立面向持续更新知识流的在线适应能力基准。该基准具体构建为细粒度上下文片段序列,其中事实在不同时间区间内动态变化。OAKS包含OAKS-BABI和OAKS-Novel两个数据集,其中每个事实在多个上下文片段中经历多次演变。这些数据集提供密集标注以衡量模型是否准确追踪变化。通过对14种采用不同推理方法的模型进行评估,我们发现现有方法存在显著局限:无论是前沿模型还是代理记忆系统,在OAKS基准上均未能实现稳健适应,表现出状态追踪延迟及在流式环境中易受干扰的特性。