Ensuring software quality in embedded firmware is critical, especially in safety-critical domains where compliance with functional safety standards (ISO 26262) requires strong guarantees of software reliability. While machine learning-based fault prediction models have demonstrated high accuracy, their lack of interpretability limits their adoption in industrial settings. Developers need actionable insights that can be directly employed in software quality assurance processes and guide defect mitigation strategies. In this paper, we present a structured process for defining context-specific software metric thresholds suitable for integration into fault detection workflows in industrial settings. Our approach supports cross-project fault prediction by deriving thresholds from one set of projects and applying them to independently developed firmware, thereby enabling reuse across similar software systems without retraining or domain-specific tuning. We analyze three real-world C-embedded firmware projects provided by an industrial partner, using Coverity and Understand static analysis tools to extract software metrics. Through statistical analysis and hypothesis testing, we identify discriminative metrics and derived empirical threshold values capable of distinguishing faulty from non-faulty functions. The derived thresholds are validated through an experimental evaluation, demonstrating their effectiveness in identifying fault-prone functions with high precision. The results confirm that the derived thresholds can serve as an interpretable solution for fault prediction, aligning with industry standards and SQA practices. This approach provides a practical alternative to black-box AI models, allowing developers to systematically assess software quality, take preventive actions, and integrate metric-based fault prediction into industrial development workflows to mitigate software faults.
翻译:在嵌入式固件中确保软件质量至关重要,尤其是在安全关键领域,遵循功能安全标准(ISO 26262)要求对软件可靠性提供强有力的保证。虽然基于机器学习的缺陷预测模型已展现出高精度,但其缺乏可解释性限制了在工业环境中的采用。开发人员需要能够直接应用于软件质量保证过程并指导缺陷缓解策略的可操作见解。本文提出了一种结构化流程,用于定义适用于集成到工业环境缺陷检测工作流中的、特定于上下文的软件度量阈值。我们的方法通过从一组项目中推导阈值并将其应用于独立开发的固件,支持跨项目缺陷预测,从而能够在无需重新训练或领域特定调优的情况下,在相似软件系统中实现复用。我们分析了一个工业合作伙伴提供的三个真实世界的C语言嵌入式固件项目,使用Coverity和Understand静态分析工具提取软件度量。通过统计分析和假设检验,我们识别出能够区分有缺陷函数与无缺陷函数的判别性度量,并推导出经验阈值。推导出的阈值通过实验评估得到验证,证明了其在以高精度识别易出错函数方面的有效性。结果证实,推导出的阈值可以作为一种可解释的缺陷预测解决方案,与行业标准和软件质量保证实践保持一致。该方法为黑盒AI模型提供了一种实用的替代方案,使开发人员能够系统性地评估软件质量、采取预防措施,并将基于度量的缺陷预测集成到工业开发工作流中,以缓解软件缺陷。