Despite their wide adoption in various domains (e.g., healthcare, finance, software engineering), Deep Learning (DL)-based applications suffer from many bugs, failures, and vulnerabilities. Reproducing these bugs is essential for their resolution, but it is extremely challenging due to the inherent nondeterminism of DL models and their tight coupling with hardware and software environments. According to recent studies, only about 3% of DL bugs can be reliably reproduced using manual approaches. To address these challenges, we present RepGen, a novel, automated, and intelligent approach for reproducing deep learning bugs. RepGen constructs a learning-enhanced context from a project, develops a comprehensive plan for bug reproduction, employs an iterative generate-validate-refine mechanism, and thus generates such code using an LLM that reproduces the bug at hand. We evaluate RepGen on 106 real-world deep learning bugs and achieve a reproduction rate of 80.19%, a 19.81% improvement over the state-of-the-art measure. A developer study involving 27 participants shows that RepGen improves the success rate of DL bug reproduction by 23.35%, reduces the time to reproduce by 56.8%, and lowers participants' cognitive load.
翻译:尽管深度学习(DL)应用已在多个领域(如医疗保健、金融、软件工程)得到广泛采用,但其仍存在大量缺陷、故障与漏洞。复现这些缺陷对于问题解决至关重要,然而由于深度学习模型固有的非确定性及其与软硬件环境的紧密耦合,复现过程极具挑战性。近期研究表明,仅约3%的深度学习缺陷可通过人工方法可靠复现。为应对这些挑战,本文提出RepGen——一种新颖、自动化、智能的深度学习缺陷复现方法。RepGen通过构建项目的学习增强上下文,制定全面的缺陷复现计划,采用迭代式的生成-验证-优化机制,最终利用大语言模型生成能够复现目标缺陷的代码。我们在106个真实场景的深度学习缺陷上评估RepGen,实现了80.19%的复现率,较现有最优方法提升19.81%。一项涉及27名开发者的实证研究表明,RepGen将深度学习缺陷复现成功率提高23.35%,复现时间缩短56.8%,并有效降低了参与者的认知负荷。