Towards Practical Tool Usage for Continually Learning LLMs

Large language models (LLMs) show an innate skill for solving language based tasks. But insights have suggested an inability to adjust for information or task-solving skills becoming outdated, as their knowledge, stored directly within their parameters, remains static in time. Tool use helps by offloading work to systems that the LLM can access through an interface, but LLMs that use them still must adapt to nonstationary environments for prolonged use, as new tools can emerge and existing tools can change. Nevertheless, tools require less specialized knowledge, therefore we hypothesize they are better suited for continual learning (CL) as they rely less on parametric memory for solving tasks and instead focus on learning when to apply pre-defined tools. To verify this, we develop a synthetic benchmark and follow this by aggregating existing NLP tasks to form a more realistic testing scenario. While we demonstrate scaling model size is not a solution, regardless of tool usage, continual learning techniques can enable tool LLMs to both adapt faster while forgetting less, highlighting their potential as continual learners.

翻译：大语言模型（LLMs）在解决语言类任务方面展现出先天能力。但研究表明，由于知识直接存储于模型参数中且保持静态，它们无法适应信息或任务解决技能的过时。工具使用通过将工作卸载至LLM可通过接口访问的系统来提供帮助，但使用工具的大语言模型仍需适应非平稳环境以实现长期应用——因为新工具会不断涌现，现有工具也会发生变更。然而，工具所需的专门知识较少，因此我们假设它们更适合持续学习（CL）：其解决任务时更少依赖参数化记忆，而是聚焦于学习何时应用预定义工具。为验证这一假设，我们构建了合成基准测试，并进一步整合现有NLP任务形成更真实的测试场景。尽管我们证明扩大模型规模（无论是否使用工具）并非解决方案，但持续学习技术能使工具型大语言模型在遗忘更少的同时实现更快适应，突显其作为持续学习者的潜力。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日