In an ever-evolving world, the dynamic nature of knowledge presents challenges for language models that are trained on static data, leading to outdated encoded information. However, real-world scenarios require models not only to acquire new knowledge but also to overwrite outdated information into updated ones. To address this under-explored issue, we introduce the temporally evolving question answering benchmark, EvolvingQA - a novel benchmark designed for training and evaluating LMs on an evolving Wikipedia database, where the construction of our benchmark is automated with our pipeline using large language models. Our benchmark incorporates question-answering as a downstream task to emulate real-world applications. Through EvolvingQA, we uncover that existing continual learning baselines have difficulty in updating and forgetting outdated knowledge. Our findings suggest that the models fail to learn updated knowledge due to the small weight gradient. Furthermore, we elucidate that the models struggle mostly on providing numerical or temporal answers to questions asking for updated knowledge. Our work aims to model the dynamic nature of real-world information, offering a robust measure for the evolution-adaptability of language models.
翻译:在持续演变的世界中,知识的动态性给基于静态数据训练的语言模型带来了挑战,导致其编码信息过时。然而,现实场景要求模型不仅要获取新知识,还需将过时信息覆盖为更新内容。为应对这一尚未充分探索的问题,我们提出时序演化问答基准EvolvingQA——一个专为在动态维基百科数据库上训练和评估语言模型而设计的新型基准,其构建通过我们的大语言模型流水线实现自动化。该基准以问答作为下游任务模拟现实应用。通过EvolvingQA,我们发现现有持续学习基线难以更新并遗忘过时知识。研究结果表明,模型因权重梯度微小而无法学习更新后的知识。此外,我们阐明了模型在回答涉及数值或时间维度的知识更新问题时困难最大。本工作旨在建模现实世界信息的动态特性,为语言模型的演化适应能力提供稳健度量。