Preble: Efficient Distributed Prompt Scheduling for LLM Serving - 专知论文

会员服务 ·

0

Prompt · 大语言模型 · 语言模型化 · ReQuEST · SimPLe ·

2024 年 10 月 3 日

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

翻译：Preble：面向大语言模型服务的高效分布式提示调度系统

Vikranth Srivatsa,Zijian He,Reyna Abhyankar,Dongming Li,Yiying Zhang

Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices are to include domain-specific instructions, illustration of tool usages, and/or long context such as textbook chapters in prompts. As such, many parts of prompts are repetitive across requests. Recent works propose to cache and reuse KV state of prompts. However, they are all confined to a single-GPU optimization, while production LLM serving systems are distributed by nature. This paper proposes Preble, the first distributed LLM serving platform that targets and optimizes for prompt sharing. We designed a distributed scheduling system that co-optimizes KV state reuse and computation load-balancing with a new scheduling algorithm and a hierarchical scheduling mechanism. Our evaluation of Preble with real workloads and request arrival patterns on two open-source LLMs shows that Preble outperforms the SOTA serving systems by 1.5X to 14.5X on average latency and 2X to 10X on p99 latency.

翻译：面向大语言模型（LLM）的提示已超越简单的用户问题。为使LLM能够解决复杂问题，当前实践通常在提示中包含领域特定的指令、工具使用示例以及长上下文（如教科书章节）。因此，提示的许多部分在不同请求间存在重复。近期研究提出缓存并复用提示的KV状态，但这些工作均局限于单GPU优化，而生产级LLM服务系统本质上是分布式的。本文提出Preble——首个以提示共享为目标并进行优化的分布式LLM服务平台。我们设计了一种分布式调度系统，通过新型调度算法与分层调度机制，协同优化KV状态复用与计算负载均衡。基于真实工作负载和请求到达模式对两个开源LLM的评估表明，Preble在平均延迟上优于现有最优服务系统1.5倍至14.5倍，在p99延迟上优于2倍至10倍。

0

相关内容

Prompt

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

语义Web知识库补全关键技术研究

国家自然科学基金

18+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Arxiv

0+阅读 · 2024年11月11日

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

Arxiv

0+阅读 · 2024年11月11日

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Arxiv

0+阅读 · 2024年11月11日

vTune: Verifiable Fine-Tuning for LLMs Through Backdooring

Arxiv

0+阅读 · 2024年11月10日

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Arxiv

0+阅读 · 2024年11月9日

ARLang: An Outdoor Augmented Reality Application for Portuguese Vocabulary Learning

Arxiv

0+阅读 · 2024年11月7日

Prompting Frameworks for Large Language Models: A Survey

Arxiv

11+阅读 · 2023年11月21日

Explainability for Large Language Models: A Survey

Arxiv

18+阅读 · 2023年9月2日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

大语言模型

语言模型化

最新内容

无人机数据战

无人机数据战

专知会员服务

2+阅读 · 58分钟前

美军研究运用“奥林匹斯山”问题解决模型以制胜灰色地带行动

美军研究运用“奥林匹斯山”问题解决模型以制胜灰色地带行动

专知会员服务

1+阅读 · 今天14:07

无人机非战争未来——实为亟待破解之困局

无人机非战争未来——实为亟待破解之困局

专知会员服务

1+阅读 · 今天13:58

《语音控制无人机导航：打通自然语言与无人机航行间的技术壁垒》60页

《语音控制无人机导航：打通自然语言与无人机航行间的技术壁垒》60页

专知会员服务

2+阅读 · 今天13:53

“信号打击”：美军提升电磁战班组级目标引导能力

“信号打击”：美军提升电磁战班组级目标引导能力

专知会员服务

1+阅读 · 今天13:49

《军用与执法部门室内射击模拟训练场综论：技术、架构、人工智能及新兴趋势》

《军用与执法部门室内射击模拟训练场综论：技术、架构、人工智能及新兴趋势》

专知会员服务

2+阅读 · 今天13:45

《从卫星通信互联到风险感知型超视距自主行动：新兴无人机架构中的通信、控制与安全》

《从卫星通信互联到风险感知型超视距自主行动：新兴无人机架构中的通信、控制与安全》

专知会员服务

2+阅读 · 今天13:42

综述 | 状态空间模型遇见遥感：SSM/Mamba如何重塑遥感视觉？

综述 | 状态空间模型遇见遥感：SSM/Mamba如何重塑遥感视觉？

专知会员服务

2+阅读 · 今天12:14

ICML 2026 | 当大模型开始发明自己的语言：如何让 LLM 用更少 Token 完成高强度推理

ICML 2026 | 当大模型开始发明自己的语言：如何让 LLM 用更少 Token 完成高强度推理

专知会员服务

3+阅读 · 今天11:21

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

专知会员服务

13+阅读 · 6月27日

2025年全球二十起重大无人机作战事件

2025年全球二十起重大无人机作战事件

专知会员服务

4+阅读 · 6月27日

现代战争的隐蔽系统：伊朗战争十大启示

现代战争的隐蔽系统：伊朗战争十大启示

专知会员服务

5+阅读 · 6月27日

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

7+阅读 · 6月26日

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

11+阅读 · 6月26日

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

17+阅读 · 6月26日

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美军研究运用“奥林匹斯山”问题解决模型以制胜灰色地带行动

《语音控制无人机导航：打通自然语言与无人机航行间的技术壁垒》60页

无人机数据战

无人机非战争未来——实为亟待破解之困局

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Arxiv

0+阅读 · 2024年11月11日

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

Arxiv

0+阅读 · 2024年11月11日

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Arxiv

0+阅读 · 2024年11月11日

vTune: Verifiable Fine-Tuning for LLMs Through Backdooring

Arxiv

0+阅读 · 2024年11月10日

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Arxiv

0+阅读 · 2024年11月9日

ARLang: An Outdoor Augmented Reality Application for Portuguese Vocabulary Learning

Arxiv

0+阅读 · 2024年11月7日

Prompting Frameworks for Large Language Models: A Survey

Arxiv

11+阅读 · 2023年11月21日

Explainability for Large Language Models: A Survey

Arxiv

18+阅读 · 2023年9月2日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

语义Web知识库补全关键技术研究

国家自然科学基金

18+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员