Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering in general and in quantum software engineering in particular due to their complexity and probabilistic nature, leading to hidden issues and wasted developer effort. We aim to create an automated framework to detect flaky tests in quantum software and an extended dataset of quantum flaky tests, overcoming the limitations of manual methods. Building on prior manual analysis of 14 quantum software repositories, we expanded the dataset and automated flaky test detection using transformers and cosine similarity. We conducted experiments with Large Language Models (LLMs) from the OpenAI GPT and Meta LLaMA families to assess their ability to detect and classify flaky tests from code and issue descriptions. Embedding transformers proved effective: we identified 25 new flaky tests, expanding the dataset by 54%. Top LLMs achieved an F1-score of 0.8871 for flakiness detection but only 0.5839 for root cause identification. We introduced an automated flaky test detection framework using machine learning, showing promising results but highlighting the need for improved root cause detection and classification in large quantum codebases. Future work will focus on improving detection techniques and developing automatic flaky test fixes.

翻译：不稳定测试指在不修改代码的情况下结果时过时败的测试，这是软件工程领域普遍面临的重大挑战，在量子软件工程中尤为突出——由于其复杂性和概率特性，这类测试会导致隐藏问题并浪费开发人员精力。本研究旨在构建自动化框架以检测量子软件中的不稳定测试，并扩展量子不稳定测试数据集，从而克服人工方法的局限性。基于先前对14个量子软件仓库的人工分析，我们通过Transformer模型和余弦相似度方法扩展了数据集并实现了不稳定测试的自动检测。我们使用OpenAI GPT系列和Meta LLaMA系列的大型语言模型进行实验，评估其从代码和问题描述中检测与分类不稳定测试的能力。嵌入Transformer模型被证明具有显著效果：我们新识别出25个不稳定测试，使数据集规模扩大54%。顶尖LLM在不稳定检测任务中取得了0.8871的F1分数，但在根因识别方面仅获得0.5839分。我们提出了基于机器学习的自动化不稳定测试检测框架，虽展现出良好前景，但也凸显了在大型量子代码库中改进根因检测与分类的必要性。未来工作将聚焦于优化检测技术并开发自动化不稳定测试修复方案。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日