Mutation testing is widely used to evaluate test-suite effectiveness, yet IEC 61131-3 Structured Text (ST) programs still lack a publicly available benchmark that supports reproducible mutation-based research. This gap is especially important because ST is extensively used in Programmable Logic Controllers (PLCs) that operate in real-time, safety-critical industrial environments, where software faults may cause equipment damage, production loss, or unsafe system behavior. To address this need, we present STMutants, a curated mutation testing dataset for industrial automation software. STMutants contains 110 generated first-order mutants derived from 11 ST programs collected from the OSCAT basic library and industrially relevant sources, of which 108 are retained after observability and equivalence screening. The dataset covers seven mutation operator categories adapted from classical taxonomies for the PLC domain, including value, relational, arithmetic, logical, negation, operation insertion/omission, and initialization faults. Each mutant is constructed through a four-phase methodology: fault-type profiling and operator selection, syntactic transformation, compilability verification, and manual equivalence screening with strong inter-rater agreement (kappa = 0.87). To demonstrate the usefulness of the dataset, we evaluate three large language models (LLMs) in a two-phase setting: test-suite generation followed by mutation kill/survive prediction. Across 108 retained mutants, the models achieve mutation detection accuracies of 86.1%, 94.4%, and 86.1%, respectively, with statistical analysis confirming significant performance differences. By providing the first publicly available mutation benchmark for ST programs, STMutants enables reproducible research on automated test generation, mutation analysis, fault localization, and AI-assisted quality assurance for PLC software.
翻译:变异测试被广泛用于评估测试套件的有效性,然而IEC 61131-3结构化文本(ST)程序目前仍缺乏支持可重复变异研究的公开基准。这一问题尤为关键,因为ST广泛用于在实时、安全至上的工业环境中运行的可编程逻辑控制器(PLC),其软件故障可能导致设备损坏、生产损失或系统不安全行为。为填补这一空白,我们提出STMutants——一个面向工业自动化软件的精心筛选的变异测试数据集。STMutants包含从OSCAT基础库及工业相关来源收集的11个ST程序生成的110个一阶变异体,经过可观察性和等价性筛选后保留108个。数据集涵盖从经典分类法适配至PLC领域的七类变异算子,包括数值、关系、算术、逻辑、取反、运算插入/删除及初始化故障。每个变异体通过四阶段方法论构建:故障类型分析与算子选择、语法变换、可编译性验证,以及具有强评估者间一致性(kappa=0.87)的手动等价性筛选。为展示数据集实用性,我们分两阶段评估三个大语言模型(LLM):首先生成测试套件,随后预测变异体的存活/击杀状态。针对108个保留变异体,各模型变异检测准确率分别达86.1%、94.4%和86.1%,统计分析证实了显著性能差异。STMutants作为首个面向ST程序的公开变异基准,将推动PLC软件在自动化测试生成、变异分析、故障定位及AI辅助质量保证领域的可重复研究。