Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models are challenging; different requests with different input and output modalities traverse different paths through the model computation graph, and each component of the model have different scaling characteristics. We present Cornserve, a distributed serving system for generic Any-to-Any models. Cornserve provides a flexible task abstraction for expressing Any-to-Any model computation graphs, enabling component disaggregation and independent scaling. The distributed runtime dispatches compute to the data plane via an efficient record-and-replay execution model that keeps track of data dependencies, and forwards tensor data between components directly from the producer to the consumer. Built on Kubernetes with approximately 23K new lines of Python, Cornserve supports diverse Any-to-Any models and delivers up to 3.81$\times$ higher throughput and 5.79$\times$ lower tail latency. Cornserve is open-source, and the demo video is available on YouTube.
翻译:任意对任意模型是一类新兴的多模态模型,能够接收多模态数据(例如文本、图像、视频、音频)的组合作为输入,并生成多模态数据作为输出。服务这些模型面临挑战:具有不同输入和输出模态的请求会遍历模型计算图中的不同路径,且模型的每个组件具有不同的扩展特性。我们提出Cornserve,一个面向通用任意对任意模型的分布式服务系统。Cornserve提供灵活的任务抽象来表达任意对任意模型的计算图,支持组件解耦和独立扩展。分布式运行时通过高效的记录-重放执行模型将计算调度至数据平面,该模型追踪数据依赖关系,并通过生产者直连消费者的方式在组件间直接转发张量数据。Cornserve构建于Kubernetes之上,包含约2.3万行新增Python代码,支持多种任意对任意模型,可实现高达3.81倍的吞吐量提升和5.79倍的尾部延迟降低。Cornserve已开源,演示视频可在YouTube上获取。