MirrorFuzz：利用LLM与共享缺陷进行深度学习框架API模糊测试 (MirrorFuzz: Leveraging LLM and Shared Bugs for Deep Learning Framework APIs Fuzzing)

Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can cascade into critical issues in higher-level applications, jeopardizing reliability and security. While numerous techniques have been proposed to detect bugs in DL frameworks, research exploring common API patterns across frameworks and the potential risks they entail remains limited. Notably, many DL frameworks expose similar APIs with overlapping input parameters and functionalities, rendering them vulnerable to shared bugs, where a flaw in one API may extend to analogous APIs in other frameworks. To address this challenge, we propose MirrorFuzz, an automated API fuzzing solution to discover shared bugs in DL frameworks. MirrorFuzz operates in three stages: First, MirrorFuzz collects historical bug data for each API within a DL framework to identify potentially buggy APIs. Second, it matches each buggy API in a specific framework with similar APIs within and across other DL frameworks. Third, it employs large language models (LLMs) to synthesize code for the API under test, leveraging the historical bug data of similar APIs to trigger analogous bugs across APIs. We implement MirrorFuzz and evaluate it on four popular DL frameworks (TensorFlow, PyTorch, OneFlow, and Jittor). Extensive evaluation demonstrates that MirrorFuzz improves code coverage by 39.92\% and 98.20\% compared to state-of-the-art methods on TensorFlow and PyTorch, respectively. Moreover, MirrorFuzz discovers 315 bugs, 262 of which are newly found, and 80 bugs are fixed, with 52 of these bugs assigned CNVD IDs.

翻译：深度学习（DL）框架是众多人工智能应用的基石。然而，深度学习框架中的缺陷可能引发高层应用的关键问题，危及可靠性与安全性。尽管已有大量技术被提出用于检测深度学习框架中的缺陷，但探究跨框架的通用API模式及其潜在风险的研究仍较为有限。值得注意的是，许多深度学习框架提供了具有重叠输入参数和功能的相似API，这使得它们容易受到共享缺陷的影响，即一个API中的缺陷可能延伸到其他框架的类似API中。为应对这一挑战，我们提出了MirrorFuzz，一种用于发现深度学习框架中共享缺陷的自动化API模糊测试方案。MirrorFuzz分三个阶段运行：首先，MirrorFuzz收集深度学习框架内每个API的历史缺陷数据，以识别潜在易出错的API。其次，它将特定框架中的每个易出错API与其他深度学习框架内及跨框架的相似API进行匹配。第三，它利用大型语言模型（LLM）为被测API合成代码，并借助相似API的历史缺陷数据来触发跨API的类似缺陷。我们实现了MirrorFuzz，并在四个主流深度学习框架（TensorFlow、PyTorch、OneFlow和Jittor）上进行了评估。大量实验表明，与最先进方法相比，MirrorFuzz在TensorFlow和PyTorch上分别将代码覆盖率提升了39.92%和98.20%。此外，MirrorFuzz共发现了315个缺陷，其中262个为新发现缺陷，80个缺陷已被修复，其中52个缺陷获得了CNVD编号。