Due to convenience, open-source software is widely used. For beneficial reasons, open-source maintainers often fix the vulnerabilities silently, exposing their users unaware of the updates to threats. Previous works all focus on black-box binary detection of the silent dependency alerts that suffer from high false-positive rates. Open-source software users need to analyze and explain AI prediction themselves. Explainable AI becomes remarkable as a complementary of black-box AI models, providing details in various forms to explain AI decisions. Noticing there is still no technique that can discover silent dependency alert on time, in this work, we propose a framework using an encoder-decoder model with a binary detector to provide explainable silent dependency alert prediction. Our model generates 4 types of vulnerability key aspects including vulnerability type, root cause, attack vector, and impact to enhance the trustworthiness and users' acceptance to alert prediction. By experiments with several models and inputs, we confirm CodeBERT with both commit messages and code changes achieves the best results. Our user study shows that explainable alert predictions can help users find silent dependency alert more easily than black-box predictions. To the best of our knowledge, this is the first research work on the application of Explainable AI in silent dependency alert prediction, which opens the door of the related domains.
翻译:由于便利性,开源软件被广泛使用。出于有益的原因,开源维护者通常悄悄修复漏洞,导致其用户未意识到更新可能带来的威胁。以往的研究均聚焦于对存在高误报率的无声依赖告警进行黑盒二分类检测,开源软件用户需要自行分析和解释AI预测结果。可解释人工智能作为黑盒AI模型的补充变得显著,它通过各种形式提供细节来解释AI决策。注意到目前尚无技术能够及时发现无声依赖告警,本文提出了一种采用编码器-解码器模型与二分类器结合的框架,以提供可解释的无声依赖漏洞预警预测。我们的模型生成四种漏洞关键方面,包括漏洞类型、根本原因、攻击向量及影响,以增强用户对告警预测的可信度与接受度。通过多个模型和输入实验,我们确认结合提交消息和代码变更的CodeBERT取得了最佳效果。用户研究表明,与黑盒预测相比,可解释的告警预测能帮助用户更轻松地发现无声依赖告警。据我们所知,这是首个将可解释人工智能应用于无声依赖告警预测的研究工作,开启了相关领域的大门。