STMutants: A Mutation Testing Dataset for Structured Text Programs in Industrial Automation

Mutation testing is widely used to evaluate test-suite effectiveness, yet IEC 61131-3 Structured Text (ST) programs still lack a publicly available benchmark that supports reproducible mutation-based research. This gap is especially important because ST is extensively used in Programmable Logic Controllers (PLCs) that operate in real-time, safety-critical industrial environments, where software faults may cause equipment damage, production loss, or unsafe system behavior. To address this need, we present STMutants, a curated mutation testing dataset for industrial automation software. STMutants contains 110 generated first-order mutants derived from 11 ST programs collected from the OSCAT basic library and industrially relevant sources, of which 108 are retained after observability and equivalence screening. The dataset covers seven mutation operator categories adapted from classical taxonomies for the PLC domain, including value, relational, arithmetic, logical, negation, operation insertion/omission, and initialization faults. Each mutant is constructed through a four-phase methodology: fault-type profiling and operator selection, syntactic transformation, compilability verification, and manual equivalence screening with strong inter-rater agreement (kappa = 0.87). To demonstrate the usefulness of the dataset, we evaluate three large language models (LLMs) in a two-phase setting: test-suite generation followed by mutation kill/survive prediction. Across 108 retained mutants, the models achieve mutation detection accuracies of 86.1%, 94.4%, and 86.1%, respectively, with statistical analysis confirming significant performance differences. By providing the first publicly available mutation benchmark for ST programs, STMutants enables reproducible research on automated test generation, mutation analysis, fault localization, and AI-assisted quality assurance for PLC software.

翻译：变异测试被广泛用于评估测试套件的有效性，然而IEC 61131-3结构化文本（ST）程序目前仍缺乏支持可重复变异研究的公开基准。这一问题尤为关键，因为ST广泛用于在实时、安全至上的工业环境中运行的可编程逻辑控制器（PLC），其软件故障可能导致设备损坏、生产损失或系统不安全行为。为填补这一空白，我们提出STMutants——一个面向工业自动化软件的精心筛选的变异测试数据集。STMutants包含从OSCAT基础库及工业相关来源收集的11个ST程序生成的110个一阶变异体，经过可观察性和等价性筛选后保留108个。数据集涵盖从经典分类法适配至PLC领域的七类变异算子，包括数值、关系、算术、逻辑、取反、运算插入/删除及初始化故障。每个变异体通过四阶段方法论构建：故障类型分析与算子选择、语法变换、可编译性验证，以及具有强评估者间一致性（kappa=0.87）的手动等价性筛选。为展示数据集实用性，我们分两阶段评估三个大语言模型（LLM）：首先生成测试套件，随后预测变异体的存活/击杀状态。针对108个保留变异体，各模型变异检测准确率分别达86.1%、94.4%和86.1%，统计分析证实了显著性能差异。STMutants作为首个面向ST程序的公开变异基准，将推动PLC软件在自动化测试生成、变异分析、故障定位及AI辅助质量保证领域的可重复研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【AAAI2023】DPText-DETR: 基于动态点query的场景文本检测，更高更快更鲁棒

专知会员服务

17+阅读 · 2023年1月23日

索邦大学121页博士论文《时间序列中的无监督异常检测》

专知会员服务

104+阅读 · 2022年7月25日

基于图注意力机制和Transformer的异常检测

专知会员服务

62+阅读 · 2022年5月16日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日