"As many of us know from bitter experience, the policies provided in extant operating systems, which are claimed to work well and behave fairly 'on the average', often fail to do so in the special cases important to us" [Wulf et al. 1974]. Written in 1974, these words motivated moving policy decisions into user-space. Today, as warehouse-scale computers (WSCs) have become ubiquitous, it is time to move policy decisions away from individual servers altogether. Built-in policies are complex and often exhibit bad performance at scale. Meanwhile, the highly-controlled WSC setting presents opportunities to improve performance and predictability. We propose moving all policy decisions from the OS kernel to the cluster manager (CM), in a new paradigm we call Grape CM. In this design, the role of the kernel is reduced to monitoring, sending metrics to the CM, and executing policy decisions made by the CM. The CM uses metrics from all kernels across the WSC to make informed policy choices, sending commands back to each kernel in the cluster. We claim that Grape CM will improve performance, transparency, and simplicity. Our initial experiments show how the CM can identify the optimal set of huge pages for any workload or improve memcached latency by 15%.
翻译:“正如我们许多人从惨痛经历中了解到的,现有操作系统中声称‘平均而言’运行良好且行为公平的策略,往往在我们关心的特殊情况下失效”[Wulf等人,1974年]。这段写于1974年的话推动了将策略决策移至用户空间。如今,随着仓库级计算机(WSC)已变得无处不在,是时候彻底将策略决策从单个服务器中移出。内置策略复杂且在大规模场景下常表现不佳。与此同时,高度可控的WSC环境为提升性能和可预测性带来了机遇。我们提出将操作系统内核中的所有策略决策迁移至集群管理器(CM),这一新范式称为Grape CM。在此设计中,内核的角色简化为监控、将指标发送至CM,以及执行CM做出的策略决策。CM利用WSC中所有内核的指标做出明智的策略选择,并将指令发回集群中的每个内核。我们声称Grape CM将提升性能、透明度和简洁性。初步实验表明,CM能识别任意工作负载的最优大页面配置,或将Memcached的时延降低15%。