Domain Generation Algorithms (DGAs) evolve continuously to evade botnet detection, posing a persistent challenge for dependable network defense. While deep learning-based detectors achieve strong performance under static conditions, they suffer severe degradation when facing temporal drift. Through a 9-year longitudinal study (2017-2025), we empirically show that state-of-the-art character- and word-based DGA classifiers rapidly lose effectiveness as new DGA variants emerge. To address this problem, we propose a drift-resilient Transformer-based framework that learns invariant representations through a hybrid tokenization strategy and multi-task self-supervised pre-training. The model integrates (i) character-level encoding to capture stochastic morphological patterns and (ii) subword-level encoding for word-based DGAs. Three pre-training tasks enable the model to learn robust structural and contextual features prior to supervised fine-tuning. Comprehensive evaluations demonstrate that our method significantly mitigates temporal degradation and consistently outperforms state-of-the-art baselines in forward-chaining experiments. The proposed approach offers a dependable foundation for long-term DGA defense in evolving threat landscapes. Our code is available at: https://github.com/snsec-net/2026-DSN-DRIFT.
翻译:域名生成算法(DGA)持续进化以规避僵尸网络检测,对可靠网络防御构成持久挑战。尽管基于深度学习的检测器在静态条件下表现出色,但面对时间漂移时性能会严重退化。通过一项为期9年的纵向研究(2017-2025),我们实证表明,随着新DGA变体的出现,当前最先进的基于字符和基于词的DGA分类器会迅速失效。为解决该问题,我们提出一种漂移鲁棒的Transformer框架,通过混合分词策略和多任务自监督预训练学习不变表示。该模型整合了(i)用于捕捉随机形态模式的字符级编码和(ii)用于基于词的DGA的子词级编码。三项预训练任务使模型在监督微调前学习到鲁棒的结构与上下文特征。全面评估表明,我们的方法显著缓解了时间退化,在前向链实验(forward-chaining experiments)中持续优于当前最先进基线方法。所提方法为演进威胁环境中的长期DGA防御提供了可靠基础。代码开源地址:https://github.com/snsec-net/2026-DSN-DRIFT。