Cluster-Aware Dual-Level Test Specification Generation for Large-Scale Automotive Software Requirements

Generating test specifications that satisfy Automotive SPICE SWE.6 requirements becomes increasingly challenging and time-consuming as projects scale to thousands of requirements. Because this manual process often consumes weeks of engineering effort, automation becomes a critical necessity. However, standard Large Language Model (LLM) approaches struggle at scale: processing requirements individually discards vital inter-requirement dependencies, while feeding entire corpora at once exceeds context-window limits, leading to incomplete integration coverage and redundant test cases. This paper presents a novel "Cluster-then-Summarize" pipeline that addresses these limitations through three-stages. Requirements are embedded using sentence transformers and grouped using UMAP dimensionality reduction followed by HDBSCAN density-based clustering. This grouping utilizes an automatic minimum cluster size selection driven by a quality criterion combining normalized Silhouette and Calinski-Harabasz scores. A multi-level map-reduce summarization algorithm then distills each cluster into concise, domain-conformant descriptions while preserving quantitative thresholds and safety integrity levels. The pipeline exploits the derived cluster topology to generate test specifications at two levels: individual requirement verification and cluster-level integration tests that verify cross-requirement feature behavior. A nearby-cluster context mechanism provides bounded cross-feature awareness during each LLM call, and Retrieval-Augmented Generation grounds all outputs in ISO 26262 and ASPICE standards. Evaluation on automotive requirement datasets of varying scale demonstrates that the cluster-aware approach improves integration test coverage and maintains summarization fidelity compared to baseline methods while scaling efficiently to thousands of requirements.

翻译：生成满足Automotive SPICE SWE.6要求的测试规格，在项目规模扩展至数千条需求时，其挑战性与耗时性显著提升。由于这一人工流程通常消耗数周的工程工作量，自动化成为关键需求。然而，标准大型语言模型方法在大规模场景下存在局限：逐条处理需求会丢失重要的需求间依赖关系，而同时输入整个语料库则会超出上下文窗口限制，导致集成覆盖不完整及测试用例冗余。本文提出一种新颖的"先聚类后摘要"流水线方法，通过三个阶段解决上述局限。首先使用句子变换器嵌入需求，并通过UMAP降维结合HDBSCAN密度聚类进行分组。该分组采用基于归一化轮廓系数与Calinski-Harabasz得分的质量准则，实现最小聚类规模的自动选择。然后运用多层级Map-Reduce摘要算法，将每个聚类提炼为简洁且符合领域规范的描述，同时保留定量阈值与安全完整性等级。该流水线利用衍生聚类拓扑结构，在两个层级生成测试规格：单条需求验证层级与验证跨需求特征行为的集群级集成测试层级。邻近聚类上下文机制为每次大型语言模型调用提供受限的跨特征感知能力，检索增强生成技术将所有输出锚定于ISO 26262与ASPICE标准。针对不同规模的汽车需求数据集进行的评估表明，与基线方法相比，该集群感知方法在高效扩展至数千条需求的同时，提升了集成测试覆盖率并保持了摘要保真度。