Modern ransomware exhibits polymorphic and evasive behaviors by frequently modifying execution patterns to evade detection. This dynamic nature disrupts feature spaces and limits the effectiveness of static or predefined models. To address this challenge, we propose TL-RL-FusionNet, a reinforcement learning (RL)-guided hybrid framework that integrates frozen dual transfer learning (TL) backbones as feature extractors with a lightweight residual multilayer perceptron (MLP) classifier. The RL agent supervises training by adaptively reweighting samples in response to variations in observable ransomware behavior. Through reward and penalty signals, the agent prioritizes complex cases such as stealthy or polymorphic ransomware employing obfuscation, while down-weighting trivial samples including benign applications with simple file I/O operations or easily classified ransomware. This adaptive mechanism enables the model to dynamically refine its strategy, improving resilience against evolving threats while maintaining strong classification performance. The framework utilizes dynamic behavioral features such as file system activity, registry changes, network traffic, API calls, and anti-analysis checks, extracted from sandbox-generated JSON reports. These features are transformed into RGB images and processed using frozen EfficientNetB0 and InceptionV3 models to capture rich feature representations efficiently. Final classification is performed by a lightweight residual MLP guided by an RL (Q-learning) agent. Experiments on a balanced dataset of 1,000 samples (500 ransomware, 500 benign) show that TL-RL-FusionNet achieves 99.1% accuracy, 98.6% precision, 99.6% recall, and 99.74% AUC, outperforming non-RL baselines by up to 2.5% in accuracy and 3.1% in recall. Efficiency analysis shows 55% lower training time and 59% reduced RAM usage, demonstrating suitability for real-world deployment.
翻译:现代勒索软件通过频繁修改执行模式表现出多态性和规避性行为,从而逃避检测。这种动态特性破坏了特征空间,限制了静态或预定义模型的有效性。为解决这一挑战,我们提出TL-RL-FusionNet,一种强化学习(RL)引导的混合框架,将冻结的双迁移学习(TL)主干作为特征提取器,与轻量级残差多层感知机(MLP)分类器集成。RL智能体通过根据观察到的勒索软件行为变化自适应地重新加权样本来监督训练。通过奖励和惩罚信号,智能体优先处理复杂案例(如采用混淆技术的隐蔽或多态勒索软件),同时降低简单样本(包括具有简单文件I/O操作的良性应用或易于分类的勒索软件)的权重。这种自适应机制使模型能够动态优化策略,从而提高对不断演变的威胁的鲁棒性,同时保持强大的分类性能。该框架利用动态行为特征,如文件系统活动、注册表更改、网络流量、API调用和反分析检查,这些特征从沙箱生成的JSON报告中提取。这些特征被转换为RGB图像,并使用冻结的EfficientNetB0和InceptionV3模型进行处理,以高效捕获丰富的特征表示。最终分类由轻量级残差MLP在RL(Q学习)智能体指导下执行。在包含1000个样本(500个勒索软件、500个良性)的平衡数据集上的实验表明,TL-RL-FusionNet实现了99.1%的准确率、98.6%的精确率、99.6%的召回率和99.74%的AUC,在准确率和召回率上分别比非RL基线高出高达2.5%和3.1%。效率分析显示训练时间降低55%,RAM使用减少59%,证明其适用于实际部署。