The transition toward software-defined vehicles concentrates an increasing share of vehicle functionality into distributed software services, where failures propagate through service dependencies and the surface symptom is often several causal hops away from the underlying defect. Existing approaches to causal root-cause analysis in such systems address this only partially: they typically reason over a single observability modality and operate in an offline, operator-driven mode that does not match the demands of continuous vehicle operation. This paper presents SDVDiag, a multimodal causal-discovery pipeline that fuses log-based and metric-based service representations into a shared embedding space before graph construction, coupled with an anomaly-driven trigger that converts the diagnostic platform from a manually operated batch tool into a continuously running online system. Evaluation on an Autonomous Valet Parking testbed shows that the multimodal pipeline produces sparser causal graphs than a metrics-only baseline (134 vs. 182 edges on average) and consistently outperforms it in edge-weighted reward against an expert knowledge graph at every stage of human-feedback refinement, showing a 2.4-fold improvement over the baseline after 60 feedback queries. An end-to-end fault-injection scenario further demonstrates that the integrated trigger correctly recovers a true root cause located two causal hops upstream of the observable symptom.
翻译:向软件定义车辆的转型将越来越多的车辆功能集中到分布式软件服务中,故障通过服务依赖关系传播,而表面症状往往与潜在缺陷相隔多个因果跳步。现有针对此类系统的因果根因分析方法仅部分解决了这一问题:它们通常基于单一可观测性模态进行推理,并以离线、操作员驱动的方式运行,无法满足车辆持续运行的需求。本文提出SDVDiag,一种多模态因果发现流水线,在构建图之前将基于日志和基于指标的服务表示融合到共享嵌入空间,并配备异常驱动触发器,将诊断平台从手动操作的批处理工具转换为持续运行的在线系统。在自主代客泊车测试平台上的评估表明,多模态流水线生成的因果图比仅基于指标的基线更稀疏(平均134条边 vs 182条边),并且在人类反馈优化的每个阶段,加权边奖励始终优于专家知识图谱基线,在60次反馈查询后相比基线提升2.4倍。端到端故障注入场景进一步证明,集成触发器能够正确恢复位于可观测症状上游两个因果跳步的真正根因。