Language models as a service (LMaaS) enable users to accomplish tasks without requiring specialized knowledge, simply by paying a service provider. However, numerous providers offer massive large language model (LLM) services with variations in latency, performance, and pricing. Consequently, constructing the cost-saving LLM services invocation strategy with low-latency and high-performance responses that meet specific task demands becomes a pressing challenge. This paper provides a comprehensive overview of the LLM services invocation methods. Technically, we give a formal definition of the problem of constructing effective invocation strategy in LMaaS and present the LLM services invocation framework. The framework classifies existing methods into four different components, including input abstract, semantic cache, solution design, and output enhancement, which can be freely combined with each other. Finally, we emphasize the open challenges that have not yet been well addressed in this task and shed light on future research.
翻译:语言即服务(LMaaS)使用户无需专业知识,只需向服务提供商付费即可完成任务。然而,众多提供商提供的大规模语言模型(LLM)服务在延迟、性能和定价方面存在差异。因此,构建满足特定任务需求、兼具低延迟与高性能响应且节省成本的LLM服务调用策略成为亟待解决的挑战。本文全面综述了LLM服务调用方法。技术上,我们给出了LMaaS中构建有效调用策略问题的形式化定义,并提出了LLM服务调用框架。该框架将现有方法分为输入抽象、语义缓存、方案设计与输出增强四个组件,各组件可自由组合。最后,我们强调了该任务中尚未妥善解决的开放性挑战,并为未来研究指明了方向。