Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyperparameter tuning, multi-run instability measurement, and visualization, addressing methodological gaps identified in prior work. We demonstrate PIDSMaker through concrete use cases and release it with preprocessed datasets and labels to support shared evaluation for the PIDS community.
翻译:近年来,基于溯源图的入侵检测系统(PIDS)通过将机器学习应用于系统溯源图,在检测高级持续性威胁(APT)方面展现出强大潜力。然而,PIDS的评估与比较仍面临困难:现有研究采用不一致的预处理流程、非标准的数据集划分方式以及互不兼容的基准标签与评价指标。这些差异破坏了研究的可复现性,阻碍了公平比较,并给研究人员带来了巨大的重新实现负担。本文提出PIDSMaker——一个在统一协议下开发与评估PIDS的开源框架。PIDSMaker将八种前沿系统整合至模块化、可扩展的架构中,提供标准化的预处理流程与基准标签,从而支持可重复的实验与公平的横向比较。基于YAML的配置接口允许通过跨系统组合组件实现快速原型构建,无需修改代码。PIDSMaker还包含用于消融研究、超参数调优、多轮运行不稳定性度量及可视化分析的实用工具,以解决现有研究中发现的方法学缺陷。我们通过具体用例展示PIDSMaker的功能,并开源发布该框架及预处理的带标签数据集,以支持PIDS研究社区的共享评估。