Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.

翻译：软件漏洞检测可表述为一个二分类问题，旨在判定给定代码片段是否包含安全缺陷。现有多模态方法通常将预训练模型提取的自然代码序列（NCS）表示与图神经网络提取的代码属性图（CPG）表示相结合，其隐含假设是引入额外模态必然带来信息增益。通过实证分析，我们证明了该假设的局限性：预训练模型已隐式编码了丰富的结构信息，导致两种模态之间存在强重叠；此外，图编码器在特征提取方面通常不如预训练语言模型有效。因此，简单的融合不仅难以获取互补信号，还可能因噪声传播而稀释有效判别线索。针对这些挑战，我们提出了一种任务条件化互补融合策略，利用Fisher信息量化任务相关性，将跨模态交互从全频谱匹配转变为任务敏感子空间内的选择性融合。理论分析表明，在各向同性扰动假设下，该策略显著收紧输出误差的上界。基于这一洞察，我们设计了TaCCS-DFA框架，该框架结合在线低秩Fisher子空间估计与自适应门控机制，实现了高效的任务导向融合。在BigVul、Devign和ReVeal基准上的实验表明，TaCCS-DFA在仅增加3.4%推理延迟的情况下，F1分数提升高达6.3个百分点，同时保持低校准误差。