Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework

Creating resilient machine learning (ML) systems has become necessary to ensure production-ready ML systems that acquire user confidence seamlessly. The quality of the input data and the model highly influence the successful end-to-end testing in data-sensitive systems. However, the testing approaches of input data are not as systematic and are few compared to model testing. To address this gap, this paper presents the Fault Injection for Undesirable Learning in input Data (FIUL-Data) testing framework that tests the resilience of ML models to multiple intentionally-triggered data faults. Data mutators explore vulnerabilities of ML systems against the effects of different fault injections. The proposed framework is designed based on three main ideas: The mutators are not random; one data mutator is applied at an instance of time, and the selected ML models are optimized beforehand. This paper evaluates the FIUL-Data framework using data from analytical chemistry, comprising retention time measurements of anti-sense oligonucleotide. Empirical evaluation is carried out in a two-step process in which the responses of selected ML models to data mutation are analyzed individually and then compared with each other. The results show that the FIUL-Data framework allows the evaluation of the resilience of ML models. In most experiments cases, ML models show higher resilience at larger training datasets, where gradient boost performed better than support vector regression in smaller training sets. Overall, the mean squared error metric is useful in evaluating the resilience of models due to its higher sensitivity to data mutation.

翻译：构建可弹性运行的机器学习系统已成为确保其达到生产就绪状态、无缝获取用户信任的必要条件。在数据敏感型系统中，输入数据质量与模型质量对端到端测试的成功具有决定性影响。然而，相较于模型测试领域，输入数据的测试方法尚缺乏系统性且研究较少。为弥补这一空白，本文提出面向输入数据中不良学习的故障注入测试框架（FIUL-Data），该框架通过人为触发多种数据故障，测试机器学习模型的弹性能力。数据变异器能够探测机器学习系统在不同故障注入影响下的潜在脆弱性。本框架基于三个核心思想设计：变异器非随机运行、单次仅应用一个数据变异器、且所选机器学习模型已预先优化。本文采用分析化学领域的数据对FIUL-Data框架进行评估，该数据包含反义寡核苷酸的保留时间测量值。实证评估采用两阶段流程：首先单独分析选定机器学习模型对数据变异的响应特性，随后进行多模型横向对比。结果表明，FIUL-Data框架能够有效评估机器学习模型的弹性能力。在多数实验场景中，模型在较大训练数据集上展现出更高的弹性，其中梯度提升算法在较小训练集上的表现优于支持向量回归。总体而言，均方误差指标因对数据变异具有更高敏感性，在评估模型弹性方面具有实用价值。