The rise of Foundation Models (FMs) like Large Language Models (LLMs) is revolutionizing software development. Despite the impressive prototypes, transforming FMware into production-ready products demands complex engineering across various domains. A critical but overlooked aspect is performance engineering, which aims at ensuring FMware meets performance goals such as throughput and latency to avoid user dissatisfaction and financial loss. Often, performance considerations are an afterthought, leading to costly optimization efforts post-deployment. FMware's high computational resource demands highlight the need for efficient hardware use. Continuous performance engineering is essential to prevent degradation. This paper highlights the significance of Software Performance Engineering (SPE) in FMware, identifying four key challenges: cognitive architecture design (i.e., the structural design that defines how AI components interact, reason, and interface with classical software components), communication protocols, tuning and optimization, and deployment. These challenges are based on literature surveys and experiences from developing an in-house FMware system. We discuss problems, current practices, and innovative paths for the software engineering community.
翻译:以大型语言模型(LLM)为代表的基础模型(FM)的兴起正在彻底改变软件开发。尽管原型系统令人印象深刻,但将FMware转化为可投入生产的产品,需要在多个领域进行复杂的工程化工作。其中一个关键但常被忽视的方面是性能工程,其目标是确保FMware满足吞吐量和延迟等性能目标,以避免用户不满和经济损失。性能考量常常是事后才考虑的,这导致在部署后需要进行代价高昂的优化工作。FMware对计算资源的高需求突显了高效利用硬件的必要性。持续的性能工程对于防止性能退化至关重要。本文强调了软件性能工程(SPE)在FMware中的重要性,并指出了四个关键挑战:认知架构设计(即定义AI组件如何交互、推理并与传统软件组件交互的结构设计)、通信协议、调优与优化以及部署。这些挑战基于文献调研以及开发一个内部FMware系统的实践经验。我们为软件工程界探讨了相关问题、当前实践以及创新路径。