Governance Controls for AI-Generated Test Artifacts in Autonomous Software Testing

Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly used in autonomous software testing; however, AI-generated test artifacts often suffer from hallucinations, compliance violations, security risks, and limited explainability. To enhance the reliability, transparency, and trustworthiness of AI-generated testing artifacts, this research introduces the concept of Governance-Aware Autonomous Testing Framework (GATF). The framework extends the autonomous testing lifecycle with governance validation, explainability analysis, probabilistic risk assessment, compliance monitoring, as well as audit governance. Experiments were performed with Defects4J and PROMISE software engineering datasets. The proposed framework successfully reduced the governance-related risks by 89.6% and demonstrated 94.3% accuracy in governance, 96.5% artifact reliability, 94.2% compliance accuracy, and 90.8% explainability performance. The results show that autonomous testing systems that are governance-aware can significantly enhance the reliability, transparency, and operational security of autonomous testing systems in comparison to conventional AI-based testing systems. The proposed architecture is scalable and reliable and provides a safe environment for software testing.

翻译：人工智能（AI）和大语言模型（LLMs）在自主软件测试中的应用日益广泛；然而，AI生成的测试制品常常存在幻觉、合规违规、安全风险以及可解释性有限等问题。为提升AI生成测试制品的可靠性、透明度和可信度，本研究引入了治理感知自主测试框架（GATF）的概念。该框架通过治理验证、可解释性分析、概率风险评估、合规监控以及审计治理，扩展了自主测试生命周期。实验采用了Defects4J和PROMISE软件工程数据集。所提出的框架成功将治理相关风险降低了89.6%，并在治理方面达到94.3%的准确率，制品可靠性达96.5%，合规准确率达94.2%，可解释性性能达90.8%。结果表明，与传统的基于AI的测试系统相比，具有治理感知能力的自主测试系统能够显著提升系统的可靠性、透明度和运行安全性。所提出的架构兼具可扩展性与可靠性，为软件测试提供了安全环境。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《军用自主人工智能系统的治理与安全》

专知会员服务

18+阅读 · 4月21日

智能体评判者（Agent-as-a-Judge）研究综述

专知会员服务

37+阅读 · 1月9日

用于自动驾驶系统测试的生成式人工智能：综述

专知会员服务

18+阅读 · 2025年8月28日

【新书】使用生成式人工智能进行软件测试

专知会员服务

46+阅读 · 2025年1月6日