Multimodal Multi-Agent Ransomware Analysis Using AutoGen

Ransomware has become one of the most serious cybersecurity threats causing major financial losses and operational disruptions worldwide.Traditional detection methods such as static analysis, heuristic scanning and behavioral analysis often fall short when used alone. To address these limitations, this paper presents multimodal multi agent ransomware analysis framework designed for ransomware classification. Proposed multimodal multiagent architecture combines information from static, dynamic and network sources. Each data type is handled by specialized agent that uses auto encoder based feature extraction. These representations are then integrated through a fusion agent. After that fused representation are used by transformer based classifier. It identifies the specific ransomware family. The agents interact through an interagent feedback mechanism that iteratively refines feature representations by suppressing low confidence information. The framework was evaluated on large scale datasets containing thousands of ransomware and benign samples. Multiple experiments were conducted on ransomware dataset. It outperforms single modality and nonadaptive fusion baseline achieving improvement of up to 0.936 in Macro-F1 for family classification and reducing calibration error. Over 100 epochs, the agentic feedback loop displays a stable monotonic convergence leading to over +0.75 absolute improvement in terms of agent quality and a final composite score of around 0.88 without fine tuning of the language models. Zeroday ransomware detection remains family dependent on polymorphism and modality disruptions. Confidence aware abstention enables reliable real world deployment by favoring conservativeand trustworthy decisions over forced classification. The findings indicate that proposed approach provides a practical andeffective path toward improving real world ransomware defense systems.

翻译：勒索软件已成为全球范围内造成重大经济损失和运营中断的最严重网络安全威胁之一。传统的静态分析、启发式扫描和行为分析等检测方法在单独使用时往往存在不足。为应对这些局限性，本文提出了一种专为勒索软件分类设计的多模态多智能体分析框架。该多模态多智能体架构整合了来自静态、动态和网络源的信息。每种数据类型由专用智能体处理，该智能体采用基于自动编码器的特征提取方法。这些表征随后通过融合智能体进行集成。融合后的表征由基于Transformer的分类器使用，以识别特定的勒索软件家族。智能体通过跨智能体反馈机制进行交互，该机制通过抑制低置信度信息迭代优化特征表征。该框架在包含数千个勒索软件与良性样本的大规模数据集上进行了评估。在勒索软件数据集上进行了多组实验，结果表明：在家族分类任务中，本框架优于单模态和非自适应融合基线方法，Macro-F1指标最高提升0.936，同时降低了校准误差。经过超过100轮训练周期，智能体反馈环路展现出稳定的单调收敛趋势，智能体质量获得超过+0.75的绝对提升，最终复合得分达到约0.88（无需对语言模型进行微调）。零日勒索软件检测效果仍受多态性和模态干扰的影响而存在家族依赖性。置信度感知的弃权机制通过倾向于保守可信的决策而非强制分类，实现了可靠的现实场景部署。研究结果表明，所提出的方法为改进现实世界勒索软件防御系统提供了一条切实有效的路径。