Microservice systems (MSS) have become a predominant architectural style for cloud services. Yet the community still lacks high-quality, publicly available datasets for anomaly detection (AD) and root cause analysis (RCA) in MSS. Most benchmarks emphasize performance-related faults and provide only one or two monitoring modalities, limiting research on broader failure modes and cross-modal methods. To address these gaps, we introduce a new multimodal anomaly dataset built on two open-source microservice systems: SocialNetwork and TrainTicket. We design and inject four categories of anomalies (Ano): performance-level, service-level, database-level, and code-level, to emulate realistic anomaly modes. For each scenario, we collect five modalities (Mod): logs, metrics, distributed traces, API responses, and code coverage reports, offering a richer, end-to-end view of system state and inter-service interactions. We name our dataset, reflecting its unique properties, as AnoMod. This dataset enables (1) evaluation of cross-modal anomaly detection and fusion/ablation strategies, and (2) fine-grained RCA studies across service and code regions, supporting end-to-end troubleshooting pipelines that jointly consider detection and localization.
翻译:微服务系统已成为云服务的主流架构范式,然而学术界仍缺乏面向微服务系统异常检测与根因分析的高质量公开数据集。现有基准数据集多聚焦性能相关故障,且仅提供一至两种监控模态,限制了针对更广泛故障模式及跨模态方法的研究。为填补这些空白,我们基于两个开源微服务系统(SocialNetwork 与 TrainTicket)构建了新型多模态异常数据集。我们设计并注入了四类异常:性能级、服务级、数据库级与代码级,以模拟真实异常模式。针对每个异常场景,我们采集了五种模态数据:日志、指标、分布式追踪、API响应与代码覆盖率报告,从而提供更丰富、端到端的系统状态及服务间交互视图。我们将该数据集命名为 AnoMod,以体现其独特属性。该数据集支持:(1)跨模态异常检测及融合/消融策略的评估;(2)跨服务与代码区域的细粒度根因分析研究,为兼顾检测与定位的端到端故障排查流程提供支撑。