基于资源感知并行推测解码的协作式大语言模型推理 (Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding) - 专知论文

会员服务 ·

0

解码 · 并行 · 资源感知 · 协作 · 边缘 ·

2025 年 11 月 26 日

Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

翻译：基于资源感知并行推测解码的协作式大语言模型推理

Jungyeon Koh,Hyun Jong Yang

The growing demand for on-device large language model (LLM) inference highlights the need for efficient mobile edge computing (MEC) solutions, especially in resource-constrained settings. Speculative decoding offers a promising solution by partitioning token generation between a lightweight draft model on mobile devices and a powerful target model on edge servers, but suffers from communication overhead and asynchronous delays. This paper is the first to propose a unified framework that jointly optimizes user association and resource allocation (UARA) to support efficient parallel speculative decoding. We solve the UARA problem using a multi-agent deep reinforcement learning algorithm. To evaluate our approach under realistic conditions, we conduct experiments using the Sionna simulator. Results show that our method achieves up to 28.0% and an average of 23.7% reduction in end-to-end latency without compromising inference accuracy, enabling scalable and low-latency LLM services in MEC systems.

翻译：设备端大语言模型（LLM）推理需求的增长突显了对高效移动边缘计算（MEC）解决方案的需求，尤其是在资源受限的场景中。推测解码通过在移动设备上的轻量级草稿模型与边缘服务器上的强大目标模型之间划分令牌生成，提供了一种有前景的解决方案，但存在通信开销和异步延迟的问题。本文首次提出了一个统一框架，联合优化用户关联与资源分配（UARA），以支持高效的并行推测解码。我们采用多智能体深度强化学习算法求解UARA问题。为了在真实条件下评估我们的方法，我们使用Sionna模拟器进行了实验。结果表明，在不影响推理准确性的前提下，我们的方法实现了端到端延迟最高降低28.0%，平均降低23.7%，从而在MEC系统中实现了可扩展且低延迟的LLM服务。

0

相关内容

【AAAI2026】TOFA：面向视觉-语言模型的免训练一次性联邦自适应方法

【AAAI2026】TOFA：面向视觉-语言模型的免训练一次性联邦自适应方法

专知会员服务

13+阅读 · 2025年11月23日

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

时间序列基础模型综述：用大型语言模型推广时间序列表征

时间序列基础模型综述：用大型语言模型推广时间序列表征

专知会员服务

68+阅读 · 2024年5月11日

LLMCad:快速可扩展的设备上大型语言模型推理

LLMCad:快速可扩展的设备上大型语言模型推理

专知会员服务

35+阅读 · 2023年9月11日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【KDD2021】双重图强化神经推荐模型

【KDD2021】双重图强化神经推荐模型

专知会员服务

13+阅读 · 2021年11月10日

【KDD2021】面向时空数据建模的跨节点联邦图神经网络

专知会员服务

64+阅读 · 2021年6月11日

【ICML2020】持续终身学习的神经主题建模

【ICML2020】持续终身学习的神经主题建模

专知会员服务

40+阅读 · 2020年6月22日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

时空数据挖掘:综述

时空数据挖掘:综述

专知

36+阅读 · 2022年6月30日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

专知

10+阅读 · 2020年8月12日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

产业智能官

18+阅读 · 2018年7月26日

Github 项目推荐 | Nvidia 用于数据增强和 JPEG 图像解码的 GPU 加速库 DALI

Github 项目推荐 | Nvidia 用于数据增强和 JPEG 图像解码的 GPU 加速库 DALI

AI研习社

11+阅读 · 2018年6月27日

语义Web知识库补全关键技术研究

国家自然科学基金

18+阅读 · 2017年12月31日

基于抽象语义切片和后向求精分析的静态分析警报自动确认研究

国家自然科学基金

1+阅读 · 2015年12月31日

结合知识图谱的概率话题模型研究

国家自然科学基金

10+阅读 · 2015年12月31日

不确定数据流的分布并行Skyline查询技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于支撑函数的不规则形态扩展目标建模和估计研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

大数据环境下基于GMDH的客户分类半监督集成模型研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于融合先验知识的机器学习的多传感器融合研究

国家自然科学基金

16+阅读 · 2013年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

175+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

181+阅读 · 2023年3月24日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2026】TOFA：面向视觉-语言模型的免训练一次性联邦自适应方法

【AAAI2026】TOFA：面向视觉-语言模型的免训练一次性联邦自适应方法

专知会员服务

13+阅读 · 2025年11月23日

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

时间序列基础模型综述：用大型语言模型推广时间序列表征

时间序列基础模型综述：用大型语言模型推广时间序列表征

专知会员服务

68+阅读 · 2024年5月11日

LLMCad:快速可扩展的设备上大型语言模型推理

LLMCad:快速可扩展的设备上大型语言模型推理

专知会员服务

35+阅读 · 2023年9月11日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【KDD2021】双重图强化神经推荐模型

【KDD2021】双重图强化神经推荐模型

专知会员服务

13+阅读 · 2021年11月10日

【KDD2021】面向时空数据建模的跨节点联邦图神经网络

专知会员服务

64+阅读 · 2021年6月11日

【ICML2020】持续终身学习的神经主题建模

【ICML2020】持续终身学习的神经主题建模

专知会员服务

40+阅读 · 2020年6月22日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

相关资讯

时空数据挖掘:综述

时空数据挖掘:综述

专知

36+阅读 · 2022年6月30日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

专知

10+阅读 · 2020年8月12日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

产业智能官

18+阅读 · 2018年7月26日

Github 项目推荐 | Nvidia 用于数据增强和 JPEG 图像解码的 GPU 加速库 DALI

Github 项目推荐 | Nvidia 用于数据增强和 JPEG 图像解码的 GPU 加速库 DALI

AI研习社

11+阅读 · 2018年6月27日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

175+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

181+阅读 · 2023年3月24日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

18+阅读 · 2017年12月31日

基于抽象语义切片和后向求精分析的静态分析警报自动确认研究

国家自然科学基金

1+阅读 · 2015年12月31日

结合知识图谱的概率话题模型研究

国家自然科学基金

10+阅读 · 2015年12月31日

不确定数据流的分布并行Skyline查询技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于支撑函数的不规则形态扩展目标建模和估计研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

大数据环境下基于GMDH的客户分类半监督集成模型研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于融合先验知识的机器学习的多传感器融合研究

国家自然科学基金

16+阅读 · 2013年12月31日

微信扫码咨询专知VIP会员