高效多用户个性化扩散模型卸载：一种DRL-凸优化混合解决方案 (Efficient Multi-user Offloading of Personalized Diffusion Models: A DRL-Convex Hybrid Solution)

With the impressive generative capabilities of diffusion models, personalized content synthesis has emerged as the most highly anticipated. However, the large model sizes and iterative nature of inference make it difficult to deploy personalized diffusion models broadly on local devices with varying computational power. To this end, we propose a novel framework for efficient multi-user offloading of personalized diffusion models, given a variable number of users, diverse user computational capabilities, and fluctuating available computational resources on the edge server. To enhance computational efficiency and reduce storage burden on edge servers, we first propose a tailored multi-user hybrid inference manner, where the inference process for each user is split into two phases with an optimizable split point. The initial phase of inference is processed on a cluster-wide model using batching techniques, generating low-level semantic information corresponding to each user's prompt. Then, the users employ their own personalized model to add further details in the later inference phase. Given the constraints on edge server computational resources and users' preferences for low latency and high accuracy, we model the joint optimization of each user's offloading request handling and split point as an extension of the Generalized Quadratic Assignment Problem (GQAP). Our objective is to maximize a comprehensive metric that accounts for both latency and accuracy across all users. To tackle this NP-hard problem, we transform the GQAP into an adaptive decision sequence, model it as a Markov decision process, and develop a hybrid solution combining deep reinforcement learning with convex optimization techniques. Simulation results validate the effectiveness of our framework, demonstrating superior optimality and low complexity compared to traditional methods.

翻译：随着扩散模型展现出令人印象深刻的生成能力，个性化内容合成已成为最受期待的应用方向。然而，大模型规模与推理过程的迭代特性使得个性化扩散模型难以广泛部署在计算能力各异的本地设备上。为此，本文提出一种面向个性化扩散模型的高效多用户卸载新框架，该框架需考虑用户数量可变、用户计算能力异构以及边缘服务器可用计算资源波动等多重因素。为提升计算效率并减轻边缘服务器存储负担，我们首先提出一种定制化的多用户混合推理机制：将每个用户的推理过程划分为两个阶段，并引入可优化的分割点。推理的初始阶段采用批处理技术在集群共享模型上执行，生成与各用户提示词对应的低层级语义信息；随后，用户利用其个性化模型在后续推理阶段补充细节特征。鉴于边缘服务器计算资源受限以及用户对低延迟与高精度的双重需求，我们将每个用户卸载请求处理与分割点的联合优化建模为广义二次分配问题（GQAP）的扩展形式。我们的目标是通过最大化综合考虑所有用户延迟与精度的综合指标。为攻克这一NP难问题，我们将GQAP转化为自适应决策序列，建模为马尔可夫决策过程，并提出一种融合深度强化学习与凸优化技术的混合求解方案。仿真结果验证了所提框架的有效性，相较于传统方法展现出更优的最优性与更低的时间复杂度。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日