This paper addresses the urgent need for messaging standards in the operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML applications embedded in systems like robots, satellites, and unmanned vehicles. It examines the suitability of the IEEE Standard 1671 (IEEE Std 1671), known as the Automatic Test Markup Language (ATML), an XML-based standard originally developed for electronic systems, for ML application testing. The paper explores extending IEEE Std 1671 to encompass the unique challenges of ML applications, including the use of datasets and dependencies on software. Through modeling various tests such as adversarial robustness and drift detection, this paper offers a framework adaptable to specific applications, suggesting that minor modifications to ATML might suffice to address the novelties of ML. This paper differentiates ATML's focus on testing from other ML standards like Predictive Model Markup Language (PMML) or Open Neural Network Exchange (ONNX), which concentrate on ML model specification. We conclude that ATML is a promising tool for effective, near real-time operational T&E of ML applications, an essential aspect of AI lifecycle management, safety, and governance.
翻译:本文针对机器学习应用(尤其是嵌入在机器人、卫星及无人系统等设备中的边缘机器学习应用)在运行测试与评估(T&E)中对消息传递标准的迫切需求。通过分析电子系统领域基于XML的IEEE 1671标准(即自动测试标记语言,ATML)在机器学习测试中的适用性,探讨了扩展该标准以应对机器学习独特挑战(包括数据集使用与软件依赖)的方法。通过对对抗鲁棒性和漂移检测等多种测试进行建模,本文提出了可适配特定应用的框架,表明对ATML的微调可能足以应对机器学习的新特性。本文明确了ATML聚焦于测试领域的特点,区别于其他专注于模型规范的标准(如预测模型标记语言PMML或开放式神经网络交换格式ONNX)。研究结论表明,ATML是实现机器学习应用高效近实时运行测试与评估(即AI生命周期管理、安全治理的关键环节)的理想工具。