Autonomous Driving Systems (ADS) are safety-critical, where failures can be severe. While Metamorphic Testing (MT) is effective for fault detection in ADS, existing methods rely heavily on manual effort and lack automation. We present AutoMT, a multi-agent MT framework powered by Large Language Models (LLMs) that automates the extraction of Metamorphic Relations (MRs) from local traffic rules and the generation of valid follow-up test cases. AutoMT leverages LLMs to extract MRs from traffic rules in Gherkin syntax using a predefined ontology. A vision-language agent analyzes scenarios, and a search agent retrieves suitable MRs from a RAG-based database to generate follow-up cases via computer vision. Experiments show that AutoMT achieves up to 5 x higher test diversity in follow-up case generation compared to the best baseline (manual expert-defined MRs) in terms of validation rate, and detects up to 20.55% more behavioral violations. While manual MT relies on a fixed set of predefined rules, AutoMT automatically extracts diverse metamorphic relations that augment real-world datasets and help uncover corner cases often missed during in-field testing and data collection. Its modular architecture separating MR extraction, filtering, and test generation supports integration into industrial pipelines and potentially enables simulation-based testing to systematically cover underrepresented or safety-critical scenarios.
翻译:自动驾驶系统(ADS)属于安全关键系统,其故障可能造成严重后果。蜕变测试(MT)虽能有效检测ADS中的缺陷,但现有方法严重依赖人工且缺乏自动化。本文提出AutoMT,一种基于大语言模型(LLM)的多智能体蜕变测试框架,能够从本地交通规则中自动提取蜕变关系(MR)并生成有效的后续测试用例。AutoMT利用LLM通过预定义本体从Gherkin语法描述的交通规则中提取MR。视觉-语言智能体分析场景,搜索智能体从基于RAG的数据库中检索合适的MR,并通过计算机视觉生成后续用例。实验表明,在验证率指标上,AutoMT生成的后续测试用例多样性较最佳基线方法(人工专家定义的MR)提升高达5倍,且多检测出20.55%的行为违规。传统人工蜕变测试依赖固定预定义规则集,而AutoMT能自动提取多样化的蜕变关系,这些关系可增强现实数据集,并有助于发现现场测试与数据收集中常被遗漏的边界情况。其模块化架构将MR提取、筛选与测试生成相分离,支持集成至工业流水线,并有望通过基于仿真的测试系统性地覆盖低代表性或安全关键场景。