Malicious URL detection remains a critical cybersecurity challenge as adversaries increasingly employ sophisticated evasion techniques including obfuscation, character-level perturbations, and adversarial attacks. Although pre-trained language models (PLMs) like BERT have shown potential for URL analysis tasks, three limitations persist in current implementations: (1) inability to effectively model the non-natural hierarchical structure of URLs, (2) insufficient sensitivity to character-level obfuscation, and (3) lack of mechanisms to incorporate auxiliary network-level signals such as IP addresses-all essential for robust detection. To address these challenges, we propose CURL-IP, an advanced multi-modal detection framework incorporating three key innovations: (1) Token-Contrastive Representation Enhancer, which enhances subword token representations through token-aware contrastive learning to produce more discriminative and isotropic embeddings; (2) Cross-Layer Multi-Scale Aggregator, employing hierarchical aggregation of Transformer outputs via convolutional operations and gated MLPs to capture both local and global semantic patterns across layers; and (3) Blockwise Multi-Modal Coupler that decomposes URL-IP features into localized block units and computes cross-modal attention weights at the block level, enabling fine-grained inter-modal interaction. This architecture enables simultaneous preservation of fine-grained lexical cues, contextual semantics, and integration of network-level signals. Our evaluation on large-scale real-world datasets shows the framework significantly outperforms state-of-the-art baselines across binary and multi-class classification tasks.
翻译:恶意URL检测作为一项关键的网络安全挑战,随着攻击者越来越多地采用包括混淆、字符级扰动和对抗攻击在内的复杂规避技术而持续存在。尽管像BERT这样的预训练语言模型在URL分析任务中显示出潜力,但当前实现仍存在三个局限:(1) 无法有效建模URL的非自然层次结构,(2) 对字符级混淆的敏感性不足,以及(3) 缺乏整合辅助网络级信号(如IP地址)的机制——而这些对于稳健检测都至关重要。为应对这些挑战,我们提出了CURL-IP,一个先进的多模态检测框架,包含三项关键创新:(1) 令牌对比表征增强器,通过令牌感知的对比学习增强子词令牌表征,以产生更具区分性和各向同性的嵌入;(2) 跨层多尺度聚合器,通过卷积操作和门控MLP对Transformer输出进行层次化聚合,以捕获跨层的局部和全局语义模式;(3) 分块多模态耦合器,将URL-IP特征分解为局部块单元,并在块级别计算跨模态注意力权重,从而实现细粒度的模态间交互。该架构能够同时保留细粒度词汇线索、上下文语义,并整合网络级信号。我们在大规模真实数据集上的评估表明,该框架在二分类和多分类任务上均显著优于最先进的基线方法。