Semantic Compression of LLM Instructions via Symbolic Metalanguages

We introduce MetaGlyph, a symbolic language for compressing prompts by encoding instructions as mathematical symbols rather than prose. Unlike systems requiring explicit decoding rules, MetaGlyph uses symbols like $\in$ (membership) and $\Rightarrow$ (implication) that models already understand from their training data. We test whether these symbols work as ''instruction shortcuts'' that models can interpret without additional teaching. We evaluate eight models across two dimensions relevant to practitioners: scale (3B-1T parameters) and accessibility (open-source for local deployment vs. proprietary APIs). MetaGlyph achieves 62-81% token reduction across all task types. For API-based deployments, this translates directly to cost savings; for local deployments, it reduces latency and memory pressure. Results vary by model. Gemini 2.5 Flash achieves 75% semantic equivalence between symbolic and prose instructions on selection tasks, with 49.9% membership operator fidelity. Kimi K2 reaches 98.1% fidelity for implication ($\Rightarrow$) and achieves perfect (100%) accuracy on selection tasks with symbolic prompts. GPT-5.2 Chat shows the highest membership fidelity observed (91.3%), though with variable parse success across task types. Claude Haiku 4.5 achieves 100% parse success with 26% membership fidelity. Among mid-sized models, Qwen 2.5 7B shows 62% equivalence on extraction tasks. Mid-sized open-source models (7B-12B) show near-zero operator fidelity, suggesting a U-shaped relationship where sufficient scale overcomes instruction-tuning biases.

翻译：我们提出MetaGlyph，一种通过将指令编码为数学符号而非自然文本来压缩提示词的符号语言。与需要显式解码规则的系统不同，MetaGlyph使用模型已从其训练数据中理解的符号，如$\in$（属于）和$\Rightarrow$（蕴含）。我们测试这些符号是否能作为模型无需额外教学即可解释的“指令捷径”。我们在两个对实践者相关的维度上评估了八个模型：规模（3B-1T参数）和可访问性（用于本地部署的开源模型与专有API）。MetaGlyph在所有任务类型上实现了62-81%的token缩减。对于基于API的部署，这直接转化为成本节约；对于本地部署，它降低了延迟和内存压力。结果因模型而异。Gemini 2.5 Flash在选择题任务上实现了符号指令与文本指令之间75%的语义等价性，其中属于运算符保真度为49.9%。Kimi K2在蕴含运算符（$\Rightarrow$）上达到98.1%的保真度，并在使用符号提示的选择题任务上实现了完美（100%）准确率。GPT-5.2 Chat显示出观察到的最高属于运算符保真度（91.3%），尽管在不同任务类型间解析成功率存在差异。Claude Haiku 4.5实现了100%的解析成功率，但属于运算符保真度为26%。在中等规模模型中，Qwen 2.5 7B在抽取任务上显示出62%的等价性。中等规模开源模型（7B-12B）显示出接近零的运算符保真度，这表明存在一种U形关系，即足够的规模可以克服指令微调带来的偏差。