DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures

from arxiv, 71 pages, 15 figures, 22 tables. Preprint; under preparation for journal submission. Standalone version of Chapter 7 of the lead author's PhD thesis (Dalhousie University, 2026). Replication package: https://github.com/SigmaJahan/DEFaultplusplus-Transformer-Debugging

Transformer models are widely deployed in critical AI applications, yet faults in their attention mechanisms, projections, and other internal components often degrade behavior silently without raising runtime errors. Existing fault diagnosis techniques often target generic deep neural networks and cannot identify which transformer component is responsible for an observed symptom. In this article, we present DEFault++, a hierarchical learning-based diagnostic technique that operates at three level of abstraction: it detects whether a fault is present, classifies it into one of 12 transformer-specific fault categories (covering both attention-internal mechanisms and surrounding architectural components), and identifies the underlying root cause from up to 45 mechanisms. To facilitate both training and evaluation, we construct DEFault-bench, a benchmark of 3,739 labeled instances obtained through systematic mutation testing. These instances are created across seven transformer models and nine downstream tasks using DEForm, a transformer-specific mutation technique we developed for this purpose. DEFault++ measures runtime behavior at the level of individual transformer components. It organizes these measurements through a Fault Propagation Graph (FPG) derived from the transformer architecture. It then produces an interpretable diagnosis using prototype matching combined with supervised contrastive learning. On DEFault-bench, DEFault++ exceeds an AUROC of 0.96 for detection and a Macro-F1 of 0.85 for both categorization and root-cause diagnosis on encoder and decoder architectures. In a developer study with 21 practitioners, the accuracy of choosing correct repair actions increased from 57.1% without support to 83.3% when using DEFault++.

翻译：Transformer模型已广泛部署于关键人工智能应用中，但其注意力机制、投影层及其他内部组件的故障常会静默导致行为退化，且不触发运行时错误。现有故障诊断技术多针对通用深度神经网络，无法识别导致观测症状的特定Transformer组件。本文提出DEFault++——一种基于层次化学习的诊断技术，在三个抽象层级运行：检测是否存在故障，将故障归入12种Transformer特定故障类别（涵盖注意力内部机制及周边架构组件），并从多达45种机制中识别根本原因。为支撑训练与评估，我们构建了DEFault-bench基准测试集，包含通过系统性变异测试获得的3,739个标注实例。这些实例通过我们为此开发的Transformer特定变异技术DEForm，在七个Transformer模型及九项下游任务上生成。DEFault++在单个Transformer组件层级测量运行时行为，通过基于Transformer架构导出的故障传播图（FPG）组织这些测量数据，随后采用原型匹配结合有监督对比学习生成可解释诊断。在DEFault-bench上，DEFault++在检测任务中AUROC超过0.96，在编码器与解码器架构的分类及根因诊断任务中Macro-F1均超过0.85。一项面向21名从业者的开发者研究表明，使用DEFault++后，正确选择修复动作的准确率从无支持时的57.1%提升至83.3%。