Packer identification tools are a critical foundation of malware analysis, directly affecting unpacking, behavioral analysis, malware classification, and threat attribution. However, their semantic correctness is rarely validated. In practice, a tool may return a plausible packer label that is nevertheless semantically wrong, leading to failed unpacking and unreliable downstream analysis. This paper presents a semantic validation framework for testing and repairing packer identification tools. Our key idea is to use unpackers as executable semantic contracts. If a tool predicts a packer family, the corresponding unpacker should recover analyzable program content. This enables automatic test oracles without requiring manually labeled ground truth. Building on this idea, we develop a systematic pipeline for detecting, localizing, and repairing semantic faults in existing packer identification tools. We then conduct the first large-scale empirical study of semantic bugs in eleven open-source packer identification tools and six proprietary VirusTotal tools. Our results reveal that semantic bugs are widespread and recurring, largely due to incomplete signatures and unstable heuristic logic. After repair, packer identification coverage improves by up to 58.6%, and downstream malware classification performance improves by more than 13.6% on average. These findings show that semantic validation of packer identification tools is essential for building trustworthy malware analysis pipelines.
翻译:打包器识别工具是恶意软件分析的关键基础,直接影响解包、行为分析、恶意软件分类及威胁归因。然而,其语义正确性很少得到验证。在实践中,工具可能返回一个看似合理的打包器标签,但该标签在语义上却是错误的,从而导致解包失败和不可靠的下游分析。本文提出了一种用于测试和修复打包器识别工具的语义验证框架。我们的核心思想是将解包器作为可执行的语义契约。如果工具预测出某个打包器家族,相应的解包器应能恢复出可分析的程序内容。这实现了无需人工标注真实值的自动测试预言。基于这一思想,我们开发了一个系统化的流水线,用于检测、定位和修复现有打包器识别工具中的语义缺陷。随后,我们首次对十一个开源打包器识别工具和六个专有的VirusTotal工具进行了大规模的语义缺陷实证研究。结果表明,语义缺陷普遍存在且反复出现,主要源于不完整的签名和不稳定的启发式逻辑。修复后,打包器识别覆盖率最高提升58.6%,下游恶意软件分类性能平均提升超过13.6%。这些发现表明,对打包器识别工具进行语义验证对于构建可信的恶意软件分析流水线至关重要。