We formulate low-level malware detection using algorithms based on feature matching as Order-based Malware Detection with Critical Instructions (General-OMDCI): given a pattern in the form of a sequence \(M\) of colored blocks, where each block contains a critical character (representing a unique sequence of critical instructions potentially associated with malware but without certainty), and a program \(A\), represented as a sequence of \(n\) colored blocks with critical characters, the goal is to find two subsequences, \(M'\) of \(M\) and \(A'\) of \(A\), with blocks matching in color and whose critical characters form a permutation of each other. When $M$ is a permutation in both colors and critical characters the problem is called OMDCI. If we additionally require $M'=M$, then the problem is called OMDCI+; if in this case $d=|M|$ is used as a parameter, then the OMDCI+ problem is easily shown to be FPT. Our main (negative) results are on the cases when $|M|$ is arbitrary and are summarized as follows: OMDCI+ is NP-complete, which implies OMDCI is also NP-complete. For the special case of OMDCI, deciding if the optimal solution has length $0$ (i.e., deciding if no part of \(M\) appears in \(A\)) is co-NP-hard. As a result, the OMDCI problem does not admit an FPT algorithm unless P=co-NP. In summary, our results imply that using algorithms based on feature matching to identify malware or determine the absence of malware in a given low-level program are both hard.
翻译:我们将基于特征匹配的低级恶意软件检测形式化为基于关键指令顺序的恶意软件检测问题(General-OMDCI):给定一个由彩色块序列 \(M\) 构成的模式,其中每个块包含一个关键字符(代表可能与恶意软件相关但不确定的唯一关键指令序列),以及一个由 \(n\) 个带有关键字符的彩色块序列表示的程序 \(A\),目标是找到两个子序列 \(M'\)(来自 \(M\))和 \(A'\)(来自 \(A\)),其块在颜色上匹配且关键字符互为排列。当 \(M\) 在颜色和关键字符上均为排列时,该问题称为 OMDCI。若额外要求 \(M'=M\),则问题称为 OMDCI+;若此时以 \(d=|M|\) 作为参数,则可轻易证明 OMDCI+ 问题属于 FPT。我们的主要(负面)结果针对 \(|M|\) 为任意值的情况,总结如下:OMDCI+ 是 NP 完全的,这意味着 OMDCI 也是 NP 完全的。对于 OMDCI 的特殊情况,判定最优解长度是否为 \(0\)(即判定 \(M\) 是否完全不出现于 \(A\) 中)是 co-NP 难的。因此,除非 P=co-NP,否则 OMDCI 问题不存在 FPT 算法。总之,我们的结果表明,基于特征匹配的算法在识别给定低级程序中的恶意软件或判定其不存在恶意软件方面均具有计算困难性。