Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

We present a longitudinal, drift-aware evaluation of adversarial robustness across more than a decade of Android applications using static and dynamic feature representations extracted from emulator and real-device executions. The dataset is organized into yearly slices and evaluated under three deployment protocols that emulate realistic learning scenarios: (1) same-year training and testing, (2) cross-year deployment without model updates, and (3) expanding-window retraining with cumulative historical data. Across multiple classifier families, adversarial examples are generated using FGSM and SPSA under feasibility constraints. We measure clean performance, Adversarial Accuracy (AA), Attack Success Rate (ASR), and introduce temporal linkage metrics -- RobustDrop, $Δ$ASR, and Adversarial Amplification Factor (AAF) -- to quantify the relationship between distribution shift and robustness degradation.nResults show that temporal separation is associated with reduced adversarial robustness under the evaluated transfer-based feature-space setting. As the train-test gap increases, clean accuracy and adversarial accuracy decline, while attack success exhibits configuration-dependent increases, particularly under FGSM perturbations and static features. Expanding-window retraining mitigates, but does not eliminate, robustness loss under continued distributional evolution. These findings indicate that temporal drift should be considered when assessing the long-term robustness of intelligent detection systems under evolving data distributions and highlight the need for drift-aware robustness assessment frameworks in long-lived adversarial environments.

翻译：摘要：我们提出一项纵向、考虑漂移的对抗鲁棒性评估，跨越十余年安卓应用，使用从模拟器和真实设备执行中提取的静态与动态特征表示。数据集按年度切片组织，并在三种模拟现实学习场景的部署协议下评估：(1)同年训练与测试，(2)跨年部署且不更新模型，(3)基于累积历史数据的扩展窗口再训练。在多个分类器家族中，对抗样本在可行性约束下使用FGSM和SPSA生成。我们测量干净性能、对抗准确率(AA)、攻击成功率(ASR)，并引入时间关联指标——鲁棒性衰减(RobustDrop)、ΔASR及对抗放大因子(AAF)——以量化分布偏移与鲁棒性退化之间的关系。结果表明，在所评估的基于迁移的特征空间设置下，时间分离与对抗鲁棒性降低相关。随着训练-测试间隔增大，干净准确率与对抗准确率下降，而攻击成功率呈现配置依赖的增加，尤其在FGSM扰动与静态特征下。扩展窗口再训练可缓解但无法消除持续分布演化下的鲁棒性损失。这些发现表明，在评估智能检测系统在演化数据分布下的长期鲁棒性时，需考虑时间漂移，并强调长期对抗环境中需建立考虑漂移的鲁棒性评估框架。