Hydra: Robust Hardware-Assisted Malware Detection

Malware detection using Hardware Performance Counters (HPCs) offers a promising, low-overhead approach for monitoring program behavior. However, a fundamental architectural constraint, that only a limited number of hardware events can be monitored concurrently, creates a significant bottleneck, leading to detection blind spots. Prior work has primarily focused on optimizing machine learning models for a single, statically chosen event set, or on ensembling models over the same feature set. We argue that robustness requires diversifying not only the models, but also the underlying feature sets (i.e., the monitored hardware events) in order to capture a broader spectrum of program behavior. This observation motivates the following research question: Can detection performance be improved by trading temporal granularity for broader coverage, via the strategic scheduling of different feature sets over time? To answer this question, we propose Hydra, a novel detection mechanism that partitions execution traces into time slices and learns an effective schedule of feature sets and corresponding classifiers for deployment. By cycling through complementary feature sets, Hydra mitigates the limitations of a fixed monitoring perspective. Our experimental evaluation shows that Hydra significantly outperforms state-of-the-art single-feature-set baselines, achieving a 19.32% improvement in F1 score and a 60.23% reduction in false positive rate. These results underscore the importance of feature-set diversity and establish strategic multi-feature-set scheduling as an effective principle for robust, hardware-assisted malware detection.

翻译：利用硬件性能计数器进行恶意软件检测，为监控程序行为提供了一种低开销且前景广阔的方法。然而，一个根本性的架构限制——即只能同时监控有限数量的硬件事件——造成了显著的瓶颈，导致检测盲区。先前的研究主要集中于为单一、静态选定的事件集优化机器学习模型，或对同一特征集进行模型集成。我们认为，要实现鲁棒性，不仅需要模型多样化，还需要基础特征集（即监控的硬件事件）多样化，以捕获更广泛的程序行为。这一观察引出了以下研究问题：能否通过随时间策略性地调度不同特征集，以牺牲时间粒度换取更广泛的覆盖范围，从而提升检测性能？为回答此问题，我们提出Hydra，一种新颖的检测机制，它将执行轨迹划分为时间片，并学习一个有效的特征集及对应分类器的调度策略用于部署。通过循环使用互补的特征集，Hydra缓解了固定监控视角的局限性。我们的实验评估表明，Hydra显著优于最先进的单特征集基线方法，F1分数提升了19.32%，误报率降低了60.23%。这些结果强调了特征集多样性的重要性，并确立了策略性多特征集调度作为实现鲁棒硬件辅助恶意软件检测的有效原则。