Bug-Report-Driven Fault Localization: Industrial Benchmarking and Lesson Learned at ABB Robotics

Software quality assurance remains a major challenge in industrial environments, where large-scale and long-lived systems inevitably accumulate defects. Identifying the location of a fault is often time-consuming and costly, particularly during maintenance phases when developers must rely primarily on textual bug reports rather than complete runtime or code-level context. In this study, we investigated if artificial intelligence can support fault localization using only the natural-language content of bug reports. By relying only on textual information, our approach requires no access to source code, execution traces, or static analysis artifacts, making it directly deployable within existing industrial maintenance workflows. We framed fault localization as a supervised text classification problem and evaluated three traditional machine learning models (Logistic Regression, Support Vector Machine, and Random Forest) and two fine-tuned transformer-based language models (RoBERTa-Base and Distil-RoBERTa). Our evaluation used proprietary data from ABB Robotics in Sweden, comprising five years of resolved industrial bug reports, each linked to its verified code fix. This setting allowed us to assess model effectiveness under realistic industrial constraints. Our results showed that traditional models using term frequency-inverse document features consistently outperformed the fine-tuned language models on this dataset, while data augmentation improved Random Forest performance. These findings challenge the assumption that transformer-based models universally outperform classical approaches in industrial contexts with domain-specific data. We demonstrated that historical bug reports can be systematically used for text-based, artificial intelligence-assisted fault localization, providing a scalable, low-cost, and empirically grounded complement to common debugging practices in industry.

翻译：软件质量保障在工业环境中仍然是一个重大挑战，大规模且长期运行的系统不可避免地会积累缺陷。定位故障的位置通常耗时且成本高昂，尤其在维护阶段，开发者主要依赖文本形式的Bug报告，而非完整的运行时或代码级上下文。在本研究中，我们探究了人工智能是否能够仅利用Bug报告中的自然语言内容来支持故障定位。由于仅依赖文本信息，我们的方法无需访问源代码、执行轨迹或静态分析产物，因此可直接部署于现有的工业维护工作流程。我们将故障定位定义为有监督文本分类问题，并评估了三种传统机器学习模型（逻辑回归、支持向量机和随机森林）以及两种基于微调Transformer的语言模型（RoBERTa-Base和Distil-RoBERTa）。我们的评估使用了来自瑞典ABB机器人的专有数据，包含五年内已解决的工业Bug报告，每条报告均关联到已验证的代码修复。这一设置使我们能够在真实工业约束下评估模型有效性。结果表明，使用词频-逆文档频率特征的传统模型在该数据集上始终优于微调后的语言模型，而数据增强提升了随机森林的性能。这些发现挑战了Transformer基模型在领域特定数据的工业环境中普遍优于传统方法的假设。我们证明了历史Bug报告可被系统地用于基于文本的、人工智能辅助的故障定位，为工业中的常规调试实践提供了一种可扩展、低成本且经验驱动的补充方案。