Word Embedding Techniques for Malware Evolution Detection

Malware detection is a critical aspect of information security. One difficulty that arises is that malware often evolves over time. To maintain effective malware detection, it is necessary to determine when malware evolution has occurred so that appropriate countermeasures can be taken. We perform a variety of experiments aimed at detecting points in time where a malware family has likely evolved, and we consider secondary tests designed to confirm that evolution has actually occurred. Several malware families are analyzed, each of which includes a number of samples collected over an extended period of time. Our experiments indicate that improved results are obtained using feature engineering based on word embedding techniques. All of our experiments are based on machine learning models, and hence our evolution detection strategies require minimal human intervention and can easily be automated.

翻译：发现恶意软件是信息安全的一个关键方面。出现的一个困难是恶意软件经常随时间演变。为了保持有效的恶意软件检测,有必要确定何时发生了恶意软件演化,以便采取适当的对策。我们进行了各种实验,旨在探测恶意软件家庭可能演变的时间点,我们认为,二级测试旨在确认实际发生了演化。对几个恶意软件家庭进行了分析,其中每个家庭都包括长期收集的样本。我们的实验表明,利用基于文字嵌入技术的特征工程取得了更好的结果。我们的所有实验都以机器学习模型为基础,因此我们的演化检测战略需要最低限度的人类干预,并且很容易自动化。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日