Visualization-based malware detection maps raw binary bytes to grayscale images and applies learned visual classifiers, providing an evasion-resistant and disassembly-free alternative to conventional analysis pipelines. However, executable packing remains a critical failure mode: packed binaries produce high-entropy images that obscure the structural patterns these models rely on. Because packing is also prevalent in benign software (e.g., for compression or copy protection), packing state alone is not a reliable indicator of maliciousness, and existing approaches do not address this challenge within a unified supervised framework. We present ViPER, a Vision-based Packing-Aware Encoder for Robust malware detection. ViPER builds on a LoRA-adapted ViT-B/14 backbone with a dual-head architecture that jointly learns malware classification and packing detection. A packing-aware gating mechanism conditions malware predictions on the inferred packing state, enabling distinct decision boundaries for packed and unpacked inputs. To address packing label skew during training, we employ frequency-weighted losses with stratified sampling over joint class-packing strata. Evaluated on 200,000 Windows PE byteplot images, ViPER achieves a balanced accuracy of 0.8521, ROC-AUC of 0.9260, and AUPR of 0.9279, outperforming representative state-of-the-art baselines across all primary metrics, while attaining a packing detection AUC of 0.9949.
翻译:基于可视化的恶意软件检测将原始二进制字节映射为灰度图像,并应用学习型视觉分类器,为传统分析流程提供了一种抗规避且无需反汇编的替代方案。然而,可执行文件打包仍是一个关键失效模式:打包后的二进制文件会产生高熵图像,掩盖了这些模型所依赖的结构模式。由于打包在良性软件中也很常见(例如用于压缩或拷贝保护),仅凭打包状态并不可靠地指示恶意性,且现有方法未能在统一的监督框架中解决这一挑战。我们提出ViPER,一种基于视觉的打包感知编码器,用于鲁棒恶意软件检测。ViPER基于LoRA适配的ViT-B/14骨干网络,采用双头架构,联合学习恶意软件分类与打包检测。一种打包感知门控机制根据推断的打包状态调节恶意软件预测,从而为打包和未打包输入设定不同的决策边界。为应对训练期间打包标签偏差,我们采用频率加权损失与联合类打包分层的分层采样。在20万张Windows PE字节图图像上的评估显示,ViPER达到0.8521的平衡准确率、0.9260的ROC-AUC以及0.9279的AUPR,在所有主要指标上均优于代表性最新基线,同时实现了0.9949的打包检测AUC。