Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management. Yet many intelligent applications run on AI/ML platforms that optimize for high throughput even at the cost of high tail-latency. Cascade is a new AI/ML hosting platform intended to untangle this puzzle. Innovations include a legacy-friendly storage layer that moves data with minimal copying and a "fast path" that collocates data and computation to maximize responsiveness. Our evaluation shows that Cascade reduces latency by orders of magnitude with no loss of throughput.
翻译:摘要:交互式智能计算应用日益普及,这对AI/ML平台提出了新需求——需在保持高吞吐量与高效资源管理的同时,优化以降低单事件延迟。然而,许多智能应用运行在优先保障高吞吐量(即便以高尾部延迟为代价)的AI/ML平台上。Cascade是一个新型AI/ML托管平台,旨在破解这一难题。其创新包括:兼容传统系统的存储层(通过最小化数据拷贝实现高效移动)以及"快速路径"(将数据与计算协同放置以最大化响应速度)。评估表明,Cascade在保持吞吐量不变的前提下,将延迟降低了多个数量级。