Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$.
翻译:恶意软件检测是一个有趣且有价值的研究领域,因为它具有显著的实际影响和独特的机器学习挑战。我们研究了现有的长程技术及基准测试,发现它们在当前问题领域并不十分适用。本文提出全息全局卷积网络(HGConv),利用全息归约表示(HRR)的特性对序列元素中的特征进行编码与解码。与其他全局卷积方法不同,我们的方法无需复杂的核计算或精心设计的核结构。HGConv的核被定义为通过反向传播学习的简单参数。所提出的方法在微软恶意软件分类挑战赛、Drebin和EMBER恶意软件基准测试中取得了新的最优结果。由于具有序列长度的对数线性复杂度,实验结果表明,HGConv相比其他方法在运行速度上显著提升,即使在序列长度大于等于10万时,仍能实现更高效的扩展。