Ekka: Automated Diagnosis of Silent Errors in LLM Inference

LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notoriously difficult due to the substantial semantic gap between the high-level symptoms and the low-level root causes. We observe that diagnosis of silent errors can be effectively framed as a differential debugging problem by leveraging the existence of semantically correct reference implementations. We propose Ekka, an automated diagnosis system that identifies root causes by systematically aligning and comparing intermediate execution states between a target and a reference framework. We constructed a benchmark of real-world silent errors from popular serving frameworks, where Ekka shows 80% pass@1 diagnosis accuracy and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems. Ekka also diagnoses 4 new silent errors from serving frameworks, all of which have been confirmed by the developers.

翻译：LLM服务框架正快速演变，其软件栈复杂且包含大量优化。快速开发过程可能引入静默错误，即输出质量在无显式错误信号的情况下悄然下降。由于高级症状与低级根本原因之间存在巨大的语义鸿沟，诊断静默错误极其困难。我们观察到，利用语义正确的参考实现的存在，可以将静默错误的诊断有效构建为差分调试问题。我们提出Ekka，一种自动化诊断系统，通过系统地对齐并比较目标框架与参考框架间的中间执行状态，识别根本原因。我们构建了一个来自主流服务框架的真实静默错误基准测试集，其中Ekka实现了80%的pass@1诊断准确率和88%的pass@5诊断准确率，优于现有最优系统。此外，Ekka还诊断出服务框架中的4个新静默错误，均已获开发者确认。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

专知会员服务

17+阅读 · 2025年12月10日

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

24+阅读 · 2025年10月29日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日