We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.
翻译:我们考虑分布式系统中支持GPU的工作节点协同执行复杂查询的机器学习查询处理问题——这种计算风格常见于支持图像处理和自然语言处理的用户交互型应用。在这类系统中,GPU内存管理与任务放置的协同调度具有巨大潜力。我们提出"指南针"(Compass)这一新型框架,通过统一管理上述功能来降低任务延迟并提高资源利用效率,具体包括:将任务部署至数据依赖可满足的节点、将同一作业的任务聚合部署(当不会导致主机或GPU过载时),以及高效管理GPU内存。与现有最优调度器的对比表明,本方案在同等甚至更少资源条件下显著缩短了任务完成时间。在某个案例中,处理相同工作负载仅需半数服务器。