Automated incident management is critical for microservice reliability. While recent unified frameworks leverage multimodal data for joint optimization, they unrealistically assume perfect data completeness. In practice, network fluctuations and agent failures frequently cause missing modalities. Existing approaches relying on static placeholders introduce imputation noise that masks anomalies and degrades performance. To address this, we propose ARMOR, a robust self-supervised framework designed for missing modality scenarios. ARMOR features: (i) a modality-specific asymmetric encoder that isolates distribution disparities among metrics, logs, and traces; and (ii) a missing-aware gated fusion mechanism utilizing learnable placeholders and dynamic bias compensation to prevent cross-modal interference from incomplete inputs. By employing self-supervised auto-regression with mask-guided reconstruction, ARMOR jointly optimizes anomaly detection (AD), failure triage (FT), and root cause localization (RCL). AD and RCL require no fault labels, while FT relies solely on failure-type annotations for the downstream classifier. Extensive experiments demonstrate that ARMOR achieves state-of-the-art performance under complete data conditions and maintains robust diagnostic accuracy even with severe modality loss.
翻译:自动化事件管理对于保障微服务可靠性至关重要。尽管近年来统一框架通过利用多模态数据进行联合优化取得进展,但这些方法不切实际地假设数据完全完备。实际部署中,网络波动和代理故障常导致模态数据缺失。现有依赖静态占位符的方法会引入插补噪声,从而掩盖异常并降低性能。针对此问题,本文提出ARMOR——一种专为缺失模态场景设计的鲁棒自监督框架。ARMOR包含两大核心创新:(i) 模态特异性非对称编码器,用于隔离度量、日志和轨迹之间的分布差异;(ii) 面向缺失感知的门控融合机制,通过可学习占位符与动态偏差补偿,防止不完整输入引发的跨模态干扰。通过采用掩码引导重建的自监督自回归方法,ARMOR实现异常检测(AD)、故障分类(FT)与根因定位(RCL)的联合优化。其中,AD与RCL无需故障标签,而FT仅依赖下游分类器所需的事件类型注释。大量实验表明,ARMOR在数据完全完备条件下达到最优性能,即使在严重模态缺失场景下仍能保持稳健的诊断准确率。