The widespread application of large language models (LLMs) underscores the importance of deep learning (DL) technologies that rely on foundational DL libraries such as PyTorch and TensorFlow. Despite their robust features, these libraries face challenges with scalability and adaptation to rapid advancements in the LLM community. In response, tech giants like Apple and Huawei are developing their own DL libraries to enhance performance, increase scalability, and safeguard intellectual property. Ensuring the security of these libraries is crucial, with fuzzing being a vital solution. However, existing fuzzing frameworks struggle with target flexibility, effectively testing bug-prone API sequences, and leveraging the limited available information in new libraries. To address these limitations, we propose FUTURE, the first universal fuzzing framework tailored for newly introduced and prospective DL libraries. FUTURE leverages historical bug information from existing libraries and fine-tunes LLMs for specialized code generation. This strategy helps identify bugs in new libraries and uses insights from these libraries to enhance security in existing ones, creating a cycle from history to future and back. To evaluate FUTURE's effectiveness, we conduct comprehensive evaluations on three newly introduced DL libraries. Evaluation results demonstrate that FUTURE significantly outperforms existing fuzzers in bug detection, success rate of bug reproduction, validity rate of code generation, and API coverage. Notably, FUTURE has detected 148 bugs across 452 targeted APIs, including 142 previously unknown bugs. Among these, 10 have been assigned CVE IDs. Additionally, FUTURE detects 7 bugs in PyTorch, demonstrating its ability to enhance security in existing libraries in reverse.
翻译:大型语言模型(LLM)的广泛应用凸显了依赖于PyTorch和TensorFlow等基础深度学习(DL)库的深度学习技术的重要性。尽管这些库功能强大,但在可扩展性及适应LLM领域快速演进方面仍面临挑战。为此,苹果、华为等科技巨头正积极开发自有DL库以提升性能、增强可扩展性并保护知识产权。确保这些库的安全性至关重要,而模糊测试正是关键解决方案。然而,现有模糊测试框架在目标灵活性、有效测试易出错的API序列以及利用新库中有限可用信息方面存在不足。为突破这些局限,我们提出了FUTURE——首个专为新兴及前瞻性DL库设计的通用模糊测试框架。FUTURE利用现有库的历史缺陷信息,并基于LLM进行针对性代码生成的微调。该策略不仅有助于发现新库中的缺陷,还能运用从新库获得的洞见强化现有库的安全性,从而构建从历史到未来再反哺历史的循环。为评估FUTURE的有效性,我们在三个新兴DL库上进行了全面测试。评估结果表明,FUTURE在缺陷检测数量、缺陷复现成功率、代码生成有效率和API覆盖率方面均显著优于现有模糊测试工具。值得注意的是,FUTURE已在452个目标API中检测到148个缺陷,其中包含142个此前未知的缺陷。这些缺陷中有10个已获得CVE编号。此外,FUTURE还在PyTorch中检测到7个缺陷,逆向验证了其增强现有库安全性的能力。