With the impressive generative capabilities of diffusion models, personalized content synthesis has emerged as the most highly anticipated. However, the large model sizes and iterative nature of inference make it difficult to deploy personalized diffusion models broadly on local devices with varying computational power. To this end, we propose a novel framework for efficient multi-user offloading of personalized diffusion models, given a variable number of users, diverse user computational capabilities, and fluctuating available computational resources on the edge server. To enhance computational efficiency and reduce storage burden on edge servers, we first propose a tailored multi-user hybrid inference manner, where the inference process for each user is split into two phases with an optimizable split point. The initial phase of inference is processed on a cluster-wide model using batching techniques, generating low-level semantic information corresponding to each user's prompt. Then, the users employ their own personalized model to add further details in the later inference phase. Given the constraints on edge server computational resources and users' preferences for low latency and high accuracy, we model the joint optimization of each user's offloading request handling and split point as an extension of the Generalized Quadratic Assignment Problem (GQAP). Our objective is to maximize a comprehensive metric that accounts for both latency and accuracy across all users. To tackle this NP-hard problem, we transform the GQAP into an adaptive decision sequence, model it as a Markov decision process, and develop a hybrid solution combining deep reinforcement learning with convex optimization techniques. Simulation results validate the effectiveness of our framework, demonstrating superior optimality and low complexity compared to traditional methods.
翻译:随着扩散模型展现出令人印象深刻的生成能力,个性化内容合成已成为最受期待的应用方向。然而,大模型规模与推理过程的迭代特性使得个性化扩散模型难以广泛部署在计算能力各异的本地设备上。为此,本文提出一种面向个性化扩散模型的高效多用户卸载新框架,该框架需考虑用户数量可变、用户计算能力异构以及边缘服务器可用计算资源波动等多重因素。为提升计算效率并减轻边缘服务器存储负担,我们首先提出一种定制化的多用户混合推理机制:将每个用户的推理过程划分为两个阶段,并引入可优化的分割点。推理的初始阶段采用批处理技术在集群共享模型上执行,生成与各用户提示词对应的低层级语义信息;随后,用户利用其个性化模型在后续推理阶段补充细节特征。鉴于边缘服务器计算资源受限以及用户对低延迟与高精度的双重需求,我们将每个用户卸载请求处理与分割点的联合优化建模为广义二次分配问题(GQAP)的扩展形式。我们的目标是通过最大化综合考虑所有用户延迟与精度的综合指标。为攻克这一NP难问题,我们将GQAP转化为自适应决策序列,建模为马尔可夫决策过程,并提出一种融合深度强化学习与凸优化技术的混合求解方案。仿真结果验证了所提框架的有效性,相较于传统方法展现出更优的最优性与更低的时间复杂度。