We present SPL (Structured Prompt Language), a declarative SQL-inspired language that treats large language models as generative knowledge bases and their context windows as constrained resources. SPL provides explicit WITH BUDGET/LIMIT token management, an automatic query optimizer, EXPLAIN transparency analogous to SQL's EXPLAIN ANALYZE, and native integration of retrieval-augmented generation (RAG) and persistent memory in a single declarative framework. SPL-flow extends SPL into resilient agentic pipelines with a three-tier provider fallback strategy (Ollama -> OpenRouter -> self-healing retry) fully transparent to the .spl script. Five extensions demonstrate the paradigm's breadth: (1) Text2SPL (multilingual NL->SPL translation); (2) Mixture-of-Models (MoM) routing that dispatches each PROMPT to a domain-specialist model at runtime; (3) Logical Chunking, an intelligent strategy for documents exceeding a single context window--expressed naturally through SPL's existing CTE syntax with no new constructs, decomposing a large query into a Map-Reduce pipeline that reduces attention cost from O(N^2) to O(N^2/k) and runs identically on cloud (parallel) or local hardware (sequential); (4) SPL-flow, a declarative agentic orchestration layer with resilient three-tier provider fallback; and (5) BENCHMARK for parallel multi-model comparison with automatic winner persistence. We provide a formal EBNF grammar, two pip-installable Python packages (spl-llm, spl-flow), and comparison against Prompty, DSPy, and LMQL. SPL reduces prompt boilerplate by 65% on average, surfaces a 68x cost spread across model tiers as a pre-execution signal, and runs the identical .spl script at $0.002 on OpenRouter or at zero marginal cost on a local Ollama instance--without modification.
翻译:本文提出SPL(结构化提示语言),这是一种受SQL启发的声明式语言,将大语言模型视为生成式知识库,并将其上下文窗口作为受限资源进行管理。SPL提供显式的WITH BUDGET/LIMIT令牌管理机制、自动查询优化器、类似SQL中EXPLAIN ANALYZE的EXPLAIN透明性,并在单一声明式框架中原生集成检索增强生成(RAG)与持久化记忆。SPL-flow将SPL扩展为具备三层供应商回退策略(Ollama -> OpenRouter -> 自修复重试)的弹性智能体流水线,该策略对.spl脚本完全透明。五项扩展展示了该范式的广泛适用性:(1)Text2SPL(多语言自然语言到SPL的转换);(2)混合模型路由机制,在运行时将每个PROMPT分派给领域专家模型;(3)逻辑分块策略——针对超出单个上下文窗口的文档,通过SPL现有CTE语法自然表达而无需新构造,将大型查询分解为Map-Reduce流水线,使注意力计算成本从O(N^2)降至O(N^2/k),并可在云端(并行)或本地硬件(顺序)上以相同方式运行;(4)SPL-flow:具备弹性三层供应商回退机制的声明式智能体编排层;(5)支持自动胜出模型持久化的并行多模型对比基准测试框架。我们提供了形式化EBNF语法、两个可通过pip安装的Python包(spl-llm, spl-flow),以及与Prompty、DSPy和LMQL的对比分析。SPL平均减少65%的提示模板代码,将模型层级间68倍的成本差异呈现为预执行信号,且无需修改即可在OpenRouter上以0.002美元运行相同.spl脚本,或在本地Ollama实例上实现零边际成本运行。