The rapid advancement and widespread deployment of foundation model (FM) based systems have revolutionized numerous applications across various domains. However, the fast-growing capabilities and autonomy have also raised significant concerns about responsible AI and AI safety. Recently, there have been increasing attention toward implementing guardrails to ensure the runtime behavior of FM-based systems is safe and responsible. Given the early stage of FMs and their applications (such as agents), the design of guardrails have not yet been systematically studied. It remains underexplored which software qualities should be considered when designing guardrails and how these qualities can be ensured from a software architecture perspective. Therefore, in this paper, we present a taxonomy for guardrails to classify and compare the characteristics and design options of guardrails. Our taxonomy is organized into three main categories: the motivation behind adopting runtime guardrails, the quality attributes to consider, and the design options available. This taxonomy provides structured and concrete guidance for making architectural design decisions when designing guardrails and highlights trade-offs arising from the design decisions.
翻译:基于基础模型(FM)的系统快速发展与广泛应用已彻底变革了众多领域的应用场景。然而,其日益增长的能力与自主性也引发了关于负责任人工智能与AI安全的重要关切。近期,通过实施护栏以确保基于FM的系统在运行时的行为安全可靠,已受到越来越多的关注。鉴于FM及其应用(如智能体)尚处于早期阶段,护栏的设计尚未得到系统性研究。在设计护栏时应考虑哪些软件质量属性,以及如何从软件架构角度确保这些质量属性,仍是亟待探索的问题。为此,本文提出了一种护栏分类法,用以分类和比较护栏的特性与设计选项。该分类法主要围绕三个维度组织:采用运行时护栏的动机、需考虑的质量属性以及可用的设计选项。这一分类法为护栏设计中的架构决策提供了结构化且具体的指导,并揭示了设计决策所带来的权衡关系。