Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information embedded in comments and thus limiting their generalization across complex code structures and logical relationships. To address this, we propose MultiVul, a multimodal contrastive framework that aligns code and comment representations through dual similarity learning and consistency regularization, augmented with diverse code-text pairs to improve robustness. Experiments on widely adopted DiverseVul and Devign datasets across four large language models (LLMs) (i.e., DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, and CodeLlama-7B) show that MultiVul achieves up to 27.07% F1 improvement over prompting-based methods and 13.37% over code-only Fine-Tuning, while maintaining comparable inference efficiency.
翻译:源代码及其附带的注释是互补且自然对齐的多模态——代码编码结构逻辑,而注释捕捉开发者的意图。然而,现有的漏洞检测方法大多依赖单模态的代码表示,忽视了注释中蕴含的互补语义信息,从而限制了其在复杂代码结构和逻辑关系上的泛化能力。为解决这一问题,我们提出MultiVul,这是一个多模态对比框架,通过双重相似性学习和一致性正则化对齐代码与注释的表示,并利用多样化的代码-文本对增强鲁棒性。在广泛采用的DiverseVul和Devign数据集上,基于四种大型语言模型(即DeepSeek-Coder-6.7B、Qwen2.5-Coder-7B、StarCoder2-7B和CodeLlama-7B)的实验表明,MultiVul相较于基于提示的方法实现了最高27.07%的F1提升,相较于仅代码的微调实现了13.37%的提升,同时保持了可比的推理效率。