Deep learning (DL) compilers are core infrastructure in modern DL systems, offering flexibility and scalability beyond vendor-specific libraries. This work uncovers a fundamental vulnerability in their design: can an official, unmodified compiler alter a model's semantics during compilation and introduce hidden backdoors? We study both adversarial and natural settings. In the adversarial case, we craft benign models where triggers have no effect pre-compilation but become effective backdoors after compilation. Tested on six models, three commercial compilers, and two hardware platforms, our attack yields 100% success on triggered inputs while preserving normal accuracy and remaining undetected by state-of-the-art detectors. The attack generalizes across compilers, hardware, and floating-point settings. In the natural setting, we analyze the top 100 HuggingFace models (including one with 220M+ downloads) and find natural triggers in 31 models. This shows that compilers can introduce risks even without adversarial manipulation. Our results reveal an overlooked threat: unmodified DL compilers can silently alter model semantics. To our knowledge, this is the first work to expose inherent security risks in DL compiler design, opening a new direction for secure and trustworthy ML.
翻译:深度学习编译器是现代深度学习系统的核心基础设施,提供了超越厂商特定库的灵活性和可扩展性。本研究揭示了其设计中的一个根本性漏洞:一个官方的、未经修改的编译器能否在编译过程中改变模型的语义并引入隐藏的后门?我们研究了对抗性和自然性两种场景。在对抗性场景中,我们构建了良性模型,其中触发器在编译前无效,但在编译后成为有效的后门。在六个模型、三个商业编译器和两个硬件平台上进行测试,我们的攻击在触发输入上实现了100%的成功率,同时保持了正常的准确率,并且未被最先进的检测器发现。该攻击在不同编译器、硬件和浮点设置中均具有普适性。在自然性场景中,我们分析了HuggingFace上排名前100的模型(包括一个下载量超过2.2亿的模型),在31个模型中发现了自然触发器。这表明即使没有对抗性操纵,编译器也可能引入风险。我们的结果揭示了一个被忽视的威胁:未经修改的深度学习编译器可以静默地改变模型语义。据我们所知,这是首次揭示深度学习编译器设计中固有安全风险的工作,为安全可信的机器学习开辟了新的研究方向。