Structured Prompt Language: Declarative Context Management for LLMs

We present SPL (Structured Prompt Language), a declarative SQL-inspired language that treats large language models as generative knowledge bases and their context windows as constrained resources. SPL provides explicit WITH BUDGET/LIMIT token management, an automatic query optimizer, EXPLAIN transparency analogous to SQL's EXPLAIN ANALYZE, and native integration of retrieval-augmented generation (RAG) and persistent memory in a single declarative framework. SPL-flow extends SPL into resilient agentic pipelines with a three-tier provider fallback strategy (Ollama -> OpenRouter -> self-healing retry) fully transparent to the .spl script. Five extensions demonstrate the paradigm's breadth: (1) Text2SPL (multilingual NL->SPL translation); (2) Mixture-of-Models (MoM) routing that dispatches each PROMPT to a domain-specialist model at runtime; (3) Logical Chunking, an intelligent strategy for documents exceeding a single context window--expressed naturally through SPL's existing CTE syntax with no new constructs, decomposing a large query into a Map-Reduce pipeline that reduces attention cost from O(N^2) to O(N^2/k) and runs identically on cloud (parallel) or local hardware (sequential); (4) SPL-flow, a declarative agentic orchestration layer with resilient three-tier provider fallback; and (5) BENCHMARK for parallel multi-model comparison with automatic winner persistence. We provide a formal EBNF grammar, two pip-installable Python packages (spl-llm, spl-flow), and comparison against Prompty, DSPy, and LMQL. SPL reduces prompt boilerplate by 65% on average, surfaces a 68x cost spread across model tiers as a pre-execution signal, and runs the identical .spl script at $0.002 on OpenRouter or at zero marginal cost on a local Ollama instance--without modification.

翻译：本文提出SPL（结构化提示语言），这是一种受SQL启发的声明式语言，将大语言模型视为生成式知识库，并将其上下文窗口作为受限资源进行管理。SPL提供显式的WITH BUDGET/LIMIT令牌管理机制、自动查询优化器、类似SQL中EXPLAIN ANALYZE的EXPLAIN透明性，并在单一声明式框架中原生集成检索增强生成（RAG）与持久化记忆。SPL-flow将SPL扩展为具备三层供应商回退策略（Ollama -> OpenRouter -> 自修复重试）的弹性智能体流水线，该策略对.spl脚本完全透明。五项扩展展示了该范式的广泛适用性：（1）Text2SPL（多语言自然语言到SPL的转换）；（2）混合模型路由机制，在运行时将每个PROMPT分派给领域专家模型；（3）逻辑分块策略——针对超出单个上下文窗口的文档，通过SPL现有CTE语法自然表达而无需新构造，将大型查询分解为Map-Reduce流水线，使注意力计算成本从O(N^2)降至O(N^2/k)，并可在云端（并行）或本地硬件（顺序）上以相同方式运行；（4）SPL-flow：具备弹性三层供应商回退机制的声明式智能体编排层；（5）支持自动胜出模型持久化的并行多模型对比基准测试框架。我们提供了形式化EBNF语法、两个可通过pip安装的Python包（spl-llm, spl-flow），以及与Prompty、DSPy和LMQL的对比分析。SPL平均减少65%的提示模板代码，将模型层级间68倍的成本差异呈现为预执行信号，且无需修改即可在OpenRouter上以0.002美元运行相同.spl脚本，或在本地Ollama实例上实现零边际成本运行。