We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Navigator, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.
翻译:本文考虑分布式系统中以GPU节点协同执行复杂查询的ML处理模式——这种计算风格常见于支持图像处理和自然语言处理的用户交互型应用。在此类系统中,GPU内存管理与任务放置的协同调度展现出重要潜力。我们提出Navigator这一新型框架,通过统一上述功能降低作业延迟并提高资源利用率:将任务部署至数据依赖可满足的节点,在不会导致主机或GPU过载的前提下将同一作业的任务协同放置,同时高效管理GPU内存。与现有先进调度器的对比表明,该方案在同等甚至更少资源条件下显著缩短任务完成时间。在某案例中,处理同等负载所需服务器数量减半。