Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first approach to addressing this challenge. The objective here is to learn a "cascade" of models, starting with lower-capacity models (such as logistic regressors) and ending with a powerful LLM, along with a deferral policy that determines the model that is used on a given input. We formulate the task of learning cascades online as an imitation-learning problem and give a no-regret algorithm for the problem. Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90%, underscoring its efficacy and adaptability in stream processing.
翻译:大语言模型(LLMs)在回答数据流复杂查询方面具有天然优势,但LLM推理的高计算成本使其在诸多此类任务中难以实际应用。我们提出在线级联学习——这是首个应对该挑战的方法。其目标是学习一个从低容量模型(如逻辑回归器)到强大LLM的"级联"模型链,并配备一个延迟策略以决定对给定输入应使用哪个模型。我们将在线级联学习任务形式化为模仿学习问题,并针对该问题提出了一个无遗憾算法。在四个基准测试上的实验结果表明,我们的方法在保持与LLM同等准确率的同时,将推理成本降低了高达90%,充分证明了其在流处理中的有效性和适应性。