Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student -- which is continuously trained on the responses of the LLM. This student gradually gains proficiency in independently handling an increasing number of user requests, a process we term neural caching. The crucial element in neural caching is a policy that decides which requests should be processed by the student alone and which should be redirected to the LLM, subsequently aiding the student's learning. In this study, we focus on classification tasks, and we consider a range of classic active learning-based selection criteria as the policy. Our experiments suggest that Margin Sampling and Query by Committee bring consistent benefits across tasks and budgets.
翻译:大规模部署生成式人工智能工具通常依赖于对大型语言模型(LLM)的昂贵API调用来满足用户查询需求。为降低这些调用的频率,可采用一个较小的语言模型(即学生模型),该模型持续基于LLM的响应进行训练。学生模型将逐渐独立处理越来越多的用户请求,这一过程我们称之为神经缓存。神经缓存的核心在于制定策略:决定哪些请求应由学生模型独立处理,哪些应重定向至LLM并随后辅助学生模型学习。本研究聚焦分类任务,并采用一系列基于经典主动学习的选择准则作为策略。实验表明,边际采样与委员会查询方法在各类任务与预算条件下均能带来持续收益。