The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.
翻译:系统日志的爆炸性增长使得流式压缩至关重要,然而现有日志异常检测(LAD)方法需要完全解压和解析,导致严重的预处理开销。我们提出CLAD,这是首个直接在压缩字节流上执行LAD的深度学习框架。CLAD利用一个关键洞察绕过了这些瓶颈:正常日志压缩成规律的字节模式,而异常会系统性地破坏这些模式。为了从模糊字节中提取这些多尺度偏差,我们提出了一种专用架构,集成了扩张卷积字节编码器、混合Transformer-mLSTM和四路聚合池化。结合两阶段训练策略,即掩码预训练与焦点对比微调,以有效应对严重的类别不平衡。在五个数据集上的评估显示,CLAD达到了最先进的平均F1分数0.9909,比最佳基线高出2.72个百分点。它在完全消除解压和解析开销的同时实现了卓越的准确性,提供了适用于结构化流式压缩机的鲁棒解决方案。