Artificial Intelligence (AI) compilers are critical for efficiently deploying AI models across diverse hardware platforms. However, they remain prone to bugs that can compromise both compiler reliability and model correctness. Thus, ensuring the quality of AI compilers is crucial. In this work, we present a unified data-driven testing framework that systematically addresses stage-specific challenges in AI compilers. Specifically, OPERA migrates tests for AI libraries to test various operator conversion logic in the model loading stage. OATest synthesizes diverse optimization-aware computational graphs for testing high-level optimizations. HARMONY generates and mutates diverse low-level IR seeds to generate hardware-optimization-aware tests for testing low-level optimizations. Together, these techniques provide a comprehensive, stage-aware framework that enhances testing coverage and effectiveness, detecting 266 previously unknown bugs in four widely used AI compilers.
翻译:人工智能(AI)编译器对于在不同硬件平台上高效部署AI模型至关重要。然而,它们仍易存在缺陷,可能同时影响编译器的可靠性与模型的正确性。因此,确保AI编译器的质量极为关键。本研究提出了一种统一的数据驱动测试框架,系统性地应对AI编译器各阶段特有的挑战。具体而言,OPERA通过迁移AI库的测试用例,以检验模型加载阶段中各类算子转换逻辑;OATest合成多样化的优化感知计算图,用于测试高层优化过程;HARMONY则通过生成并变异多样化的底层中间表示种子,产生硬件优化感知的测试用例,以验证底层优化阶段。这些技术共同构成一个覆盖全面、阶段感知的测试框架,显著提升了测试覆盖范围与有效性,在四种广泛使用的AI编译器中已检测出266个先前未知的缺陷。