An executable binary typically contains a large number of machine instructions. Although the statistics of popular instructions is well known, the distribution of non-popular instructions has been relatively under explored. Our finding shows that an arbitrary group of binaries com es with both i) a similar distribution of common machine instructions, and ii) quite a few rarely appeared instructions (e.g., less than five occurrences) apart from the distribution. Their infrequency may represent the signature of a code chunk or the footprint of a binary. In this work, we investigate such rare instructions with an in-depth analysis at the source level, clas sifying them into four categories.
翻译:可执行二进制文件通常包含大量机器指令。尽管常见指令的统计特征已广为人知,但非流行指令的分布规律尚未充分探索。本研究发现,任意一组二进制文件均呈现两个特征:i) 常见机器指令具有相似的分布模式,ii) 除上述分布外,存在相当数量的罕见指令(例如出现次数不超过五次)。这些指令的低频特性可能代表代码块的签名或二进制文件的足迹。本研究从源码层面深入分析此类罕见指令,将其划分为四种类别。