Despite the significant strides made by generative AI in just a few short years, its future progress is constrained by the challenge of building modular and robust systems. This capability has been a cornerstone of past technological revolutions, which relied on combining components to create increasingly sophisticated and reliable systems. Cars, airplanes, computers, and software consist of components-such as engines, wheels, CPUs, and libraries-that can be assembled, debugged, and replaced. A key tool for building such reliable and modular systems is specification: the precise description of the expected behavior, inputs, and outputs of each component. However, the generality of LLMs and the inherent ambiguity of natural language make defining specifications for LLM-based components (e.g., agents) both a challenging and urgent problem. In this paper, we discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute-and outline several future directions for research to enable the development of modular and reliable LLM-based systems through improved specifications.
翻译:尽管生成式人工智能在短短几年内取得了显著进展,但其未来发展仍受制于构建模块化与鲁棒性系统的挑战。这种能力历来是技术革命的基石——通过组合组件来构建日益复杂可靠的系统。汽车、飞机、计算机及软件皆由引擎、车轮、CPU、程序库等可组装、调试与替换的组件构成。构建此类可靠模块化系统的关键工具在于规范:对每个组件预期行为、输入与输出的精确描述。然而,大语言模型的通用性与自然语言固有的模糊性,使得为基于LLM的组件(如智能体)定义规范成为既具挑战性又亟待解决的问题。本文探讨了该领域迄今取得的进展——通过结构化输出、过程监督及测试时计算等突破——并展望了若干未来研究方向,旨在通过改进规范实现基于LLM的模块化可靠系统开发。