The widespread application of large language models (LLMs) underscores the importance of deep learning (DL) technologies that rely on foundational DL libraries such as PyTorch and TensorFlow. Despite their robust features, these libraries face challenges with scalability and adaptation to rapid advancements in the LLM community. In response, tech giants like Apple and Huawei are developing their own DL libraries to enhance performance, increase scalability, and safeguard intellectual property. Ensuring the security of these libraries is crucial, with fuzzing being a vital solution. However, existing fuzzing frameworks struggle with target flexibility, effectively testing bug-prone API sequences, and leveraging the limited available information in new libraries. To address these limitations, we propose FUTURE, the first universal fuzzing framework tailored for newly introduced and prospective DL libraries. FUTURE leverages historical bug information from existing libraries and fine-tunes LLMs for specialized code generation. This strategy helps identify bugs in new libraries and uses insights from these libraries to enhance security in existing ones, creating a cycle from history to future and back. To evaluate FUTURE's effectiveness, we conduct comprehensive evaluations on three newly introduced DL libraries. Evaluation results demonstrate that FUTURE significantly outperforms existing fuzzers in bug detection, success rate of bug reproduction, validity rate of code generation, and API coverage. Notably, FUTURE has detected 148 bugs across 452 targeted APIs, including 142 previously unknown bugs. Among these, 10 have been assigned CVE IDs. Additionally, FUTURE detects 7 bugs in PyTorch, demonstrating its ability to enhance security in existing libraries in reverse.
翻译:大型语言模型(LLM)的广泛应用凸显了依赖于PyTorch和TensorFlow等基础深度学习(DL)库的DL技术的重要性。尽管这些库功能强大,但在可扩展性及适应LLM领域快速演进方面仍面临挑战。为此,苹果、华为等科技巨头正在开发自有DL库以提升性能、增强可扩展性并保护知识产权。确保这些库的安全性至关重要,而模糊测试正是一种关键解决方案。然而,现有模糊测试框架在目标灵活性、有效测试易出错的API序列以及利用新库有限可用信息方面存在不足。为应对这些局限,我们提出了FUTURE——首个专为新引入及前瞻性DL库设计的通用模糊测试框架。FUTURE利用现有库的历史缺陷信息,并微调LLM以生成专用代码。该策略有助于识别新库中的缺陷,并利用从这些库中获得的洞见增强现有库的安全性,从而构建从历史到未来再回溯的循环。为评估FUTURE的有效性,我们对三个新引入的DL库进行了全面评估。评估结果表明,FUTURE在缺陷检测、缺陷复现成功率、代码生成有效率和API覆盖率方面显著优于现有模糊测试工具。值得注意的是,FUTURE已在452个目标API中检测到148个缺陷,其中包含142个此前未知的缺陷。在这些缺陷中,已有10个被分配了CVE编号。此外,FUTURE还在PyTorch中检测到7个缺陷,逆向证明了其增强现有库安全性的能力。