Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.

翻译：软件漏洞检测可形式化为一个二元分类问题，用于判断给定代码片段是否包含安全缺陷。现有的多模态方法通常将预训练模型提取的自然代码序列（NCS）表示与图神经网络提取的代码属性图（CPG）表示进行融合，其隐含假设是引入额外模态必然带来信息增益。通过实证分析，我们揭示了该假设的局限性：预训练模型已隐式编码大量结构信息，导致两种模态间存在显著重叠；此外，图编码器在特征提取方面通常弱于预训练语言模型。因此，简单融合不仅难以获得互补信号，还可能因噪声传播而削弱有效的判别性特征。为应对这些挑战，我们提出一种任务条件化互补融合策略，利用Fisher信息量化任务相关性，将跨模态交互从全谱匹配转变为任务敏感子空间内的选择性融合。理论分析表明，在各向同性扰动假设下，该策略能显著收紧输出误差的上界。基于此洞见，我们设计了TaCCS-DFA框架，结合在线低秩Fisher子空间估计与自适应门控机制，实现高效的任务导向融合。在BigVul、Devign和ReVeal基准上的实验表明，TaCCS-DFA在推理延迟仅增加3.4%的情况下，F1分数最高提升6.3个百分点，同时保持较低的校准误差。