Compliance testing in highly regulated domains is crucial but largely manual, requiring domain experts to translate complex regulations into executable test cases. While large language models (LLMs) show promise for automation, their susceptibility to hallucinations limits reliable application. Existing hybrid approaches mitigate this issue by constraining LLMs with formal models, but still rely on costly manual modeling. To solve this problem, this paper proposes RAFT, a framework for requirements auto-formalization and compliance test generation via explicating tacit regulatory knowledge from multiple LLMs. RAFT employs an Adaptive Purification-Aggregation strategy to explicate tacit regulatory knowledge from multiple LLMs and integrate it into three artifacts: a domain meta-model, a formal requirements representation, and testability constraints. These artifacts are then dynamically injected into prompts to guide high-precision requirement formalization and automated test generation. Experiments across financial, automotive, and power domains show that RAFT achieves expert-level performance, substantially outperforms state-of-the-art (SOTA) methods while reducing overall generation and review time.
翻译:在高度监管领域,合规测试至关重要但主要依赖人工,需要领域专家将复杂法规转化为可执行测试用例。尽管大型语言模型(LLMs)在自动化方面展现出潜力,但其易产生幻觉的特性限制了可靠应用。现有混合方法通过形式化模型约束LLMs来缓解此问题,但仍需依赖高成本的人工建模。为解决该问题,本文提出RAFT框架,通过从多个LLMs显式化隐性监管知识,实现需求自动形式化与合规测试生成。RAFT采用自适应纯化-聚合策略,从多个LLMs中显式化隐性监管知识,并将其整合为三个核心构件:领域元模型、形式化需求表示和可测试性约束。这些构件随后被动态注入提示中,以指导高精度需求形式化与自动化测试生成。在金融、汽车和电力领域的实验表明,RAFT达到专家级性能,显著优于现有最优方法,同时大幅减少整体生成与评审时间。