Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyperparameter tuning, multi-run instability measurement, and visualization, addressing methodological gaps identified in prior work. We demonstrate PIDSMaker through concrete use cases and release it with preprocessed datasets and labels to support shared evaluation for the PIDS community.
翻译:近年来,基于溯源图的入侵检测系统通过将机器学习技术应用于系统溯源图,在检测高级持续性威胁方面展现出巨大潜力。然而,现有PIDS的评估与比较仍面临诸多挑战:先前研究采用不一致的预处理流程、非标准的数据集划分方式,以及互不兼容的基准标签与评估指标。这些差异破坏了研究的可复现性,阻碍了公平比较,并给研究人员带来了巨大的重新实现负担。本文提出PIDSMaker——一个在统一协议下开发与评估PIDS的开源框架。该框架将八种前沿系统整合至模块化、可扩展的架构中,提供标准化的预处理流程与基准标签,支持可复现的实验设计与公平的系统对比。基于YAML的配置接口允许通过跨系统组件组合实现快速原型构建,无需修改代码。PIDSMaker还包含消融实验、超参数调优、多轮运行不稳定性度量及可视化等实用工具,以解决现有研究方法论的不足。我们通过具体用例展示PIDSMaker的功能,并发布包含预处理数据集与标签的完整框架,以支持PIDS研究社区的标准化评估工作。