Resource allocation is fundamental for cloud systems to ensure efficient resource sharing among tenants. However, the scale of such optimization problems has outgrown the capabilities of commercial solvers traditionally employed in production. To scale up resource allocation, prior approaches either tailor solutions to specific problems or rely on assumptions tied to particular workloads. In this work, we revisit real-world resource allocation problems and uncover a common underlying structure: a vast majority of these problems are inherently separable, i.e., they optimize the aggregate utility of individual resource and demand allocations, under separate constraints for each resource and each demand. Building on this insight, we develop DeDe, a general, scalable, and theoretically grounded framework for accelerating resource allocation through a "decouple and decompose" approach. DeDe systematically decouples entangled resource and demand constraints, thereby decomposing the overall optimization into alternating per-resource and per-demand allocations, which can then be solved efficiently and in parallel. We have implemented DeDe as a library extension to an open-source solver, maintaining a familiar user interface. Experimental results across three prominent resource allocation tasks -- traffic engineering, cluster scheduling, and load balancing -- demonstrate DeDe's substantial speedups and robust allocation quality.
翻译:资源分配是云系统实现租户间高效资源共享的基础。然而,此类优化问题的规模已超出生产环境中传统商用求解器的处理能力。为扩展资源分配的规模,现有方法或针对特定问题定制解决方案,或依赖于与特定工作负载相关的假设。本文重新审视实际资源分配问题,揭示其共有的底层结构:绝大多数问题本质上是可分离的,即在各自独立的资源约束与需求约束下,优化个体资源与需求分配的聚合效用。基于这一发现,我们提出DeDe——一个通用、可扩展且理论完备的框架,通过“解耦与分解”方法加速资源分配。DeDe系统性地解耦资源与需求间的耦合约束,从而将整体优化问题分解为交替执行的单资源分配与单需求分配子问题,这些子问题可被高效并行求解。我们将DeDe实现为开源求解器的库扩展,并保持用户熟悉的接口。在流量工程、集群调度与负载均衡这三个典型资源分配任务上的实验结果表明,DeDe能实现显著的加速效果并保持稳健的分配质量。