Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to traditional applications, execution time is dominated by the external components, which cannot be handled by traditional language optimization systems, like optimizing compilers. To address this problem, we develop PopPy, a system that can uncover parallelization opportunities in Python applications that invoke these heavy external components, including those used in compound AI applications. PopPy supports a very expressive fragment of Python and requires minimal developer input to uncover parallelism. It combines an ahead-of-time compiler with a runtime, addressing three key challenges in extracting parallelism from Python applications: language complexity, dynamic dispatch, and variable mutation. On a set of real-world compound AI applications, PopPy achieves up to $6.4\times$ speedups in end-to-end execution time compared to standard Python execution while preserving the sequential program semantics.
翻译:摘要:复合AI应用通过通用编程语言(如Python)编排对机器学习模型的调用,广泛用于从软件工程到企业自动化的各类面向用户任务,这使得其端到端延迟成为关键瓶颈。与传统应用不同,其执行时间主要由外部组件决定,而传统语言优化系统(如优化编译器)无法处理这些外部组件。为解决此问题,我们开发了PopPy系统,该系统能够发现调用重外部组件的Python应用(包括复合AI应用)中的并行化机会。PopPy支持Python中极具表现力的子集,仅需最少开发人员输入即可发现并行性。它结合了提前编译器和运行时,解决了从Python应用中提取并行性的三个关键挑战:语言复杂性、动态调度与变量修改。在一组真实世界复合AI应用上,与标准Python执行相比,PopPy在保持顺序程序语义的同时,实现了高达$6.4\times$的端到端执行时间加速。