Software systems generate massive unstructured logs that record execution behavior, failures, and interactions across components, yet existing log anomaly detection methods treat these logs primarily as flat sequences of templates, overlooking the relational execution structure that governs how events co-occur and evolve over time. We propose a framework that discovers this hidden structure by recovering an execution state machine directly from logs and inducing a corresponding multi-table relational schema connecting traces, events, states, transitions, and parameters. This discovered state machine serves as a generative prior to produce realistic multi-relational synthetic data that preserves structural, temporal, and process constraints while amplifying rare but valid execution behaviors. We assess the fidelity of the generated data through constraint validation, distributional similarity, and process-level metrics, and demonstrate its usefulness by showing that augmenting real logs with the synthetic relational data significantly improves anomaly and bug detection on held-out real datasets compared to sequence-based baselines and naive oversampling. Our results show that execution logs implicitly encode a relational database governed by a latent state machine, and that recovering this structure enables principled synthetic data generation for robust and interpretable anomaly detection.
翻译:软件系统生成记录执行行为、故障及组件间交互的海量非结构化日志,然而现有日志异常检测方法将这些日志主要视为模板的扁平序列,忽略了支配事件共现与时序演化的关系型执行结构。我们提出一个框架,通过直接从日志中恢复执行状态机并推导出连接轨迹、事件、状态、转移和参数的多表关系模式,来发现这一隐藏结构。该发现的状态机作为生成先验,在保留结构、时序和流程约束的同时,生成逼真的多关系合成数据,并增强罕见但合法的执行行为。我们通过约束验证、分布相似性和流程级指标评估生成数据的保真度,并证明其效用:与基于序列的基线及朴素过采样方法相比,用合成关系数据增强真实日志可显著提升保留真实数据集上的异常和缺陷检测性能。研究结果表明,执行日志隐式编码了由潜在状态机支配的关系型数据库,恢复该结构能够为鲁棒且可解释的异常检测生成规范的合成数据。