Log anomaly detection is crucial for uncovering system failures and security risks. Although logs originate from nested component executions with clear boundaries, this structure is lost when they are stored as flat sequences. As a result, state-of-the-art methods risk missing true dependencies within executions while learning spurious ones across unrelated events. We propose KRONE, the first hierarchical anomaly detection framework that automatically derives execution hierarchies from flat logs for modular multi-level anomaly detection. At its core, the KRONE Log Abstraction Model captures application-specific semantic hierarchies from log data. This hierarchy is then leveraged to recursively decompose log sequences into multiple levels of coherent execution chunks, referred to as KRONE Seqs, transforming sequence-level anomaly detection into a set of modular KRONE Seq-level detection tasks. For each test KRONE Seq, KRONE employs a hybrid modular detection mechanism that dynamically routes between an efficient level-independent Local-Context detector, which rapidly filters normal sequences, and a Nested-Aware detector that incorporates cross-level semantic dependencies and supports LLM-based anomaly detection and explanation. KRONE further optimizes hierarchical detection through cached result reuse and early-exit strategies. Experiments on three public benchmarks and one industrial dataset from ByteDance Cloud demonstrate that KRONE achieves consistent improvements in detection accuracy, F1-score, data efficiency, resource efficiency, and interpretability. KRONE improves the F1-score by more than 10 percentage points over prior methods while reducing LLM usage to only a small fraction of the test data.
翻译:日志异常检测对于发现系统故障与安全风险至关重要。尽管日志产生于具有清晰边界的嵌套组件执行过程,但当它们以扁平序列形式存储时,这种结构信息便已丢失。这导致现有先进方法在学习无关事件间的伪依赖关系时,可能遗漏执行过程中的真实依赖。我们提出了KRONE,首个能够从扁平日志中自动推导执行层次结构、实现模块化多级异常检测的层次化异常检测框架。其核心是KRONE日志抽象模型,该模型能够从日志数据中捕获应用特定的语义层次结构。随后,该层次结构被用于递归地将日志序列分解为多个层级的连贯执行块(称为KRONE序列),从而将序列级异常检测转化为一组模块化的KRONE序列级检测任务。对于每个待测KRONE序列,KRONE采用混合模块化检测机制,动态路由至两种检测器:高效的层级无关局部上下文检测器(可快速过滤正常序列)和嵌套感知检测器(融合跨层级语义依赖并支持基于LLM的异常检测与解释)。KRONE还通过缓存结果复用与提前退出策略进一步优化层次化检测流程。在三个公开基准数据集及一个来自字节跳动云的工业数据集上的实验表明,KRONE在检测准确率、F1分数、数据效率、资源效率和可解释性方面均取得了一致性提升。相较于现有方法,KRONE将F1分数提高了超过10个百分点,同时将LLM的使用量降至仅需处理极小比例的测试数据。