Evolution of Log-Based Detection Rules in Public Repositories

Log-based detection rules remain central to modern security operations, encoding domain expertise that analysts iteratively refine to balance detection coverage against alert volume. Yet while prior work has examined the evolution of network intrusion detection signatures, the longitudinal behavior of log-based detection rules has received little empirical study. We present the first longitudinal analysis of detection rule evolution across two widely used repositories: the community-driven Sigma project and the curated Splunk Security Content (SSC). To compare rule versions based on detection logic rather than surface syntax, we introduce a predicate graph intermediate representation that canonicalizes the logical structure of a rule, together with a tree alignment procedure for analyzing changes across revisions. We apply this method to 6,859 rule histories from Sigma and SSC and find that roughly 56% of rules undergo at least one revision on detection logic. Across rule lifetimes, evolution is predominantly non-monotonic, with over half of rules both adding and removing clauses over time. We further observe recurring reversions, indicating that changes are often revisited rather than strictly accumulated. Combining structural analysis with LLM-based inference and human validation of operational intent shows that roughly a quarter to a third of rules alternate between expanding coverage and reducing false positives, rather than converging toward a stable form. Together, these results reveal that detection rule evolution in public repositories reflects ongoing operational trade-offs rather than steady convergence. Our study raises questions about why rules change the way they do and supports research towards better processes for devising and deploying security rules.

翻译：基于日志的检测规则仍是现代安全运维的核心，编码了分析人员通过迭代优化以平衡检测覆盖率与告警量的领域专业知识。尽管已有研究探索了网络入侵检测签名的演化规律，但针对日志型检测规则纵向行为的大规模实证研究尚属空白。本文首次对两大广泛使用的知识库——社区驱动的Sigma项目与专业策划的Splunk安全内容（SSC）——中的检测规则演化进行了纵向分析。为基于检测逻辑而非表面语法比较规则版本，我们引入了一种谓词图中间表示，该表示能规范化规则的逻辑结构，并配套提出树对齐程序用于分析不同修订版本间的变更。我们将该方法应用于Sigma和SSC的6,859条规则历史记录，发现约56%的规则经历了至少一次检测逻辑修订。在规则生命周期内，演化呈现显著的非单调特征：超过半数的规则随时间推移既增加又删减子句。我们还观察到频繁的回归现象，表明变更往往被反复调整而非严格累积。通过将结构分析与基于大语言模型的推理及人工验证操作意图相结合，我们发现大约四分之一至三分之一的规则在扩大覆盖范围与减少误报之间交替切换，而非收敛至稳定形态。这些结果共同揭示了公共仓库中检测规则的演化反映了持续的操作权衡，而非逐步收敛。本研究提出的问题——规则为何以当前方式变化——为改进安全规则设计与部署流程的研究提供了支撑。