LLMBridge：降低以提示为中心互联网的成本 (LLMBridge: Reducing Costs in a Prompt-Centric Internet)

Today's Internet infrastructure is centered around content retrieval over HTTP, with middleboxes (e.g., HTTP proxies) playing a crucial role in performance, security, and cost-effectiveness. We envision a future where Internet communication will be dominated by "prompts" sent to generative AI models. For this, we will need proxies that provide similar functions to HTTP proxies (e.g., caching, routing, compression) while dealing with unique challenges and opportunities of prompt-based communication. As a first step toward supporting prompt-based communication, we present LLMBridge, an LLM proxy designed for cost-conscious users, such as those in developing regions and education (e.g., students, instructors). LLMBridge supports three key optimizations: model selection (routing prompts to the most suitable model), context management (intelligently reducing the amount of context), and semantic caching (serving prompts using local models and vector databases). These optimizations introduce trade-offs between cost and quality, which applications navigate through a high-level, bidirectional interface. As case studies, we deploy LLMBridge in two cost-sensitive settings: a WhatsApp-based Q&A service and a university classroom environment. The WhatsApp service has been live for over twelve months, serving 100+ users and handling more than 14.7K requests. In parallel, we exposed LLMBridge to students across three computer science courses over a semester, where it supported diverse LLM-powered applications - such as reasoning agents and chatbots - and handled an average of 500 requests per day. We report on deployment experiences across both settings and use the collected workloads to benchmark the effectiveness of various cost-optimization strategies, analyzing their trade-offs in cost, latency, and response quality.

翻译：当今互联网基础设施围绕基于HTTP的内容检索构建，中间件（如HTTP代理）在性能、安全性和成本效益方面发挥着关键作用。我们预见未来互联网通信将主要由发送至生成式AI模型的"提示"所主导。为此，我们需要能够提供类似HTTP代理功能（如缓存、路由、压缩）的代理服务器，同时应对基于提示的通信所特有的挑战与机遇。作为支持基于提示通信的第一步，我们提出LLMBridge——一个专为成本敏感型用户（如发展中地区和教育领域的用户，包括学生和教师）设计的大语言模型代理。LLMBridge支持三项关键优化：模型选择（将提示路由至最合适的模型）、上下文管理（智能缩减上下文量）和语义缓存（利用本地模型与向量数据库处理提示）。这些优化在成本与质量之间引入了权衡取舍，应用程序通过高层双向接口来协调这些权衡。作为案例研究，我们在两种成本敏感场景中部署了LLMBridge：基于WhatsApp的问答服务和大学课堂环境。该WhatsApp服务已持续运行超过12个月，服务100多名用户，处理超过1.47万次请求。同时，我们在一学期内向三门计算机科学课程的学生开放LLMBridge，该系统支持多种基于大语言模型的应用程序（如推理智能体和聊天机器人），日均处理约500次请求。我们报告了两种场景下的部署经验，并利用收集的工作负载对各种成本优化策略的有效性进行基准测试，分析其在成本、延迟和响应质量方面的权衡关系。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日