Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models are challenging; different requests with different input and output modalities traverse different paths through the model computation graph, and each component of the model have different scaling characteristics. We present Cornserve, a distributed serving system for generic Any-to-Any models. Cornserve provides a flexible task abstraction for expressing Any-to-Any model computation graphs, enabling component disaggregation and independent scaling. The distributed runtime dispatches compute to the data plane via an efficient record-and-replay execution model that keeps track of data dependencies, and forwards tensor data between components directly from the producer to the consumer. Built on Kubernetes with approximately 23K new lines of Python, Cornserve supports diverse Any-to-Any models and delivers up to 3.81$\times$ higher throughput and 5.79$\times$ lower tail latency. Cornserve is open-source, and the demo video is available on YouTube.
翻译:任意模态间模型是一类新兴的多模态模型,其能够接受多模态数据(例如文本、图像、视频、音频)的任意组合作为输入,并生成相应的多模态输出。服务此类模型具有挑战性:不同请求因其输入和输出模态的不同,在模型计算图中会经过不同的路径,且模型的各个组件具有不同的扩展特性。本文提出Cornserve,一个面向通用任意模态间模型的分布式服务系统。Cornserve提供了一种灵活的任务抽象,用于表达任意模态间模型的计算图,实现了组件的解耦与独立扩展。其分布式运行时通过一种高效的记录-回放执行模型将计算任务调度至数据平面,该模型跟踪数据依赖关系,并直接将张量数据从生产者组件转发至消费者组件。Cornserve基于Kubernetes构建,新增约23,000行Python代码,支持多种任意模态间模型,并能实现高达3.81倍的吞吐量提升和5.79倍的尾部延迟降低。Cornserve已开源,演示视频可在YouTube上观看。